๐ฅ Training Uganda Clinical Model (FULL)...
WARNING: BNB_CUDA_VERSION=125 environment variable detected; loading libbitsandbytes_cuda125.so.
This can be used to load a bitsandbytes version built with a CUDA version that is different from the PyTorch CUDA version.
If this was unintended set the BNB_CUDA_VERSION variable to an empty string: export BNB_CUDA_VERSION=
World size: 2
WARNING: BNB_CUDA_VERSION=125 environment variable detected; loading libbitsandbytes_cuda125.so.
This can be used to load a bitsandbytes version built with a CUDA version that is different from the PyTorch CUDA version.
If this was unintended set the BNB_CUDA_VERSION variable to an empty string: export BNB_CUDA_VERSION=
WARNING: BNB_CUDA_VERSION=125 environment variable detected; loading libbitsandbytes_cuda125.so.
This can be used to load a bitsandbytes version built with a CUDA version that is different from the PyTorch CUDA version.
If this was unintended set the BNB_CUDA_VERSION variable to an empty string: export BNB_CUDA_VERSION=
Creating model 0
Loading model 0
Loading & Quantizing Model Shards: 0%| | 0/2 [00:00<?, ?it/s]WARNING: BNB_CUDA_VERSION=125 environment variable detected; loading libbitsandbytes_cuda125.so.
This can be used to load a bitsandbytes version built with a CUDA version that is different from the PyTorch CUDA version.
If this was unintended set the BNB_CUDA_VERSION variable to an empty string: export BNB_CUDA_VERSION=
WARNING: BNB_CUDA_VERSION=125 environment variable detected; loading libbitsandbytes_cuda125.so.
This can be used to load a bitsandbytes version built with a CUDA version that is different from the PyTorch CUDA version.
If this was unintended set the BNB_CUDA_VERSION variable to an empty string: export BNB_CUDA_VERSION=
WARNING: BNB_CUDA_VERSION=125 environment variable detected; loading libbitsandbytes_cuda125.so.
This can be used to load a bitsandbytes version built with a CUDA version that is different from the PyTorch CUDA version.
If this was unintended set the BNB_CUDA_VERSION variable to an empty string: export BNB_CUDA_VERSION=
Loading & Quantizing Model Shards: 50%|โโโโโ | 1/2 [00:15<00:15, 15.16s/it]WARNING: BNB_CUDA_VERSION=125 environment variable detected; loading libbitsandbytes_cuda125.so.
This can be used to load a bitsandbytes version built with a CUDA version that is different from the PyTorch CUDA version.
If this was unintended set the BNB_CUDA_VERSION variable to an empty string: export BNB_CUDA_VERSION=
Loading & Quantizing Model Shards: 100%|โโโโโโโโโโ| 2/2 [00:26<00:00, 12.62s/it]
Loading & Quantizing Model Shards: 100%|โโโโโโโโโโ| 2/2 [00:26<00:00, 13.00s/it]
/usr/local/lib/python3.11/site-packages/torch/distributed/distributed_c10d.py:4631: UserWarning: No device id is provided via `init_process_group` or `barrier `. Using the current device set by the user.
warnings.warn( # warn only once
/usr/local/lib/python3.11/site-packages/torch/distributed/distributed_c10d.py:4631: UserWarning: No device id is provided via `init_process_group` or `barrier `. Using the current device set by the user.
warnings.warn( # warn only once
Rank 0: Model created: 0.107 GiB
Using BNB DORA 0
Rank 0: LoRA layers added: 0.107 GiB
Wrapping model w/ FSDP 0
Rank 0: Wrapped model: 1.625 GiB
Applying activation checkpointing 0
Total Training Steps: 99
0%| | 0/99 [00:00<?, ?it/s]
Epoch 0, Loss 0.000: 0%| | 0/99 [00:00<?, ?it/s]
Epoch 0, Loss 0.000: 1%| | 1/99 [00:08<13:29, 8.26s/it]
Epoch 0, Loss 1.426, LR 1.00e-05: 1%| | 1/99 [00:08<13:29, 8.26s/it]
Epoch 0, Loss 1.426, LR 1.00e-05: 2%|โ | 2/99 [00:10<07:31, 4.66s/it]
Epoch 0, Loss 1.409, LR 1.00e-05: 2%|โ | 2/99 [00:10<07:31, 4.66s/it]
Epoch 0, Loss 1.409, LR 1.00e-05: 3%|โ | 3/99 [00:12<05:27, 3.41s/it]
Epoch 0, Loss 1.369, LR 1.00e-05: 3%|โ | 3/99 [00:12<05:27, 3.41s/it]
Epoch 0, Loss 1.369, LR 1.00e-05: 4%|โ | 4/99 [00:14<04:27, 2.82s/it]
Epoch 0, Loss 1.200, LR 1.00e-05: 4%|โ | 4/99 [00:14<04:27, 2.82s/it]
Epoch 0, Loss 1.200, LR 1.00e-05: 5%|โ | 5/99 [00:16<03:55, 2.51s/it]
Epoch 0, Loss 1.123, LR 1.00e-05: 5%|โ | 5/99 [00:16<03:55, 2.51s/it]
Epoch 0, Loss 1.123, LR 1.00e-05: 6%|โ | 6/99 [00:18<03:37, 2.34s/it]
Epoch 0, Loss 1.126, LR 1.00e-05: 6%|โ | 6/99 [00:18<03:37, 2.34s/it]
Epoch 0, Loss 1.126, LR 1.00e-05: 7%|โ | 7/99 [00:20<03:23, 2.21s/it]
Epoch 0, Loss 0.955, LR 1.00e-05: 7%|โ | 7/99 [00:20<03:23, 2.21s/it]
Epoch 0, Loss 0.955, LR 1.00e-05: 8%|โ | 8/99 [00:22<03:20, 2.21s/it]
Epoch 0, Loss 0.910, LR 1.00e-05: 8%|โ | 8/99 [00:22<03:20, 2.21s/it]
Epoch 0, Loss 0.910, LR 1.00e-05: 9%|โ | 9/99 [00:24<03:12, 2.14s/it]
Epoch 0, Loss 0.948, LR 1.00e-05: 9%|โ | 9/99 [00:24<03:12, 2.14s/it]
Epoch 0, Loss 0.948, LR 1.00e-05: 10%|โ | 10/99 [00:26<03:05, 2.08s/it]
Epoch 0, Loss 0.674, LR 1.00e-05: 10%|โ | 10/99 [00:26<03:05, 2.08s/it]
Epoch 0, Loss 0.674, LR 1.00e-05: 11%|โ | 11/99 [00:28<02:58, 2.03s/it]
Epoch 0, Loss 0.956, LR 1.00e-05: 11%|โ | 11/99 [00:28<02:58, 2.03s/it]
Epoch 0, Loss 0.956, LR 1.00e-05: 12%|โโ | 12/99 [00:30<02:54, 2.01s/it]
Epoch 0, Loss 0.994, LR 1.00e-05: 12%|โโ | 12/99 [00:30<02:54, 2.01s/it]
Epoch 0, Loss 0.994, LR 1.00e-05: 13%|โโ | 13/99 [00:32<02:52, 2.00s/it]
Epoch 0, Loss 0.803, LR 1.00e-05: 13%|โโ | 13/99 [00:32<02:52, 2.00s/it]
Epoch 0, Loss 0.803, LR 1.00e-05: 14%|โโ | 14/99 [00:34<02:49, 2.00s/it]
Epoch 0, Loss 0.902, LR 1.00e-05: 14%|โโ | 14/99 [00:34<02:49, 2.00s/it]
Epoch 0, Loss 0.902, LR 1.00e-05: 15%|โโ | 15/99 [00:36<02:53, 2.07s/it]
Epoch 0, Loss 1.091, LR 1.00e-05: 15%|โโ | 15/99 [00:36<02:53, 2.07s/it]
Epoch 0, Loss 1.091, LR 1.00e-05: 16%|โโ | 16/99 [00:38<02:49, 2.04s/it]
Epoch 0, Loss 0.834, LR 1.00e-05: 16%|โโ | 16/99 [00:38<02:49, 2.04s/it]
Epoch 0, Loss 0.834, LR 1.00e-05: 17%|โโ | 17/99 [00:40<02:44, 2.00s/it]
Epoch 0, Loss 1.042, LR 1.00e-05: 17%|โโ | 17/99 [00:40<02:44, 2.00s/it]
Epoch 0, Loss 1.042, LR 1.00e-05: 18%|โโ | 18/99 [00:42<02:39, 1.97s/it]
Epoch 0, Loss 0.731, LR 1.00e-05: 18%|โโ | 18/99 [00:42<02:39, 1.97s/it]
Epoch 0, Loss 0.731, LR 1.00e-05: 19%|โโ | 19/99 [00:44<02:35, 1.95s/it]
Epoch 0, Loss 1.042, LR 1.00e-05: 19%|โโ | 19/99 [00:44<02:35, 1.95s/it]
Epoch 0, Loss 1.042, LR 1.00e-05: 20%|โโ | 20/99 [00:45<02:33, 1.94s/it]
Epoch 0, Loss 1.062, LR 1.00e-05: 20%|โโ | 20/99 [00:46<02:33, 1.94s/it]
Epoch 0, Loss 1.062, LR 1.00e-05: 21%|โโ | 21/99 [00:47<02:31, 1.95s/it]
Epoch 0, Loss 0.817, LR 1.00e-05: 21%|โโ | 21/99 [00:47<02:31, 1.95s/it]
Epoch 0, Loss 0.817, LR 1.00e-05: 22%|โโโ | 22/99 [00:50<02:36, 2.04s/it]
Epoch 0, Loss 0.928, LR 1.00e-05: 22%|โโโ | 22/99 [00:50<02:36, 2.04s/it]
Epoch 0, Loss 0.928, LR 1.00e-05: 23%|โโโ | 23/99 [00:52<02:32, 2.01s/it]
Epoch 0, Loss 1.187, LR 1.00e-05: 23%|โโโ | 23/99 [00:52<02:32, 2.01s/it]
Epoch 0, Loss 1.187, LR 1.00e-05: 24%|โโโ | 24/99 [00:54<02:29, 1.99s/it]
Epoch 0, Loss 1.039, LR 1.00e-05: 24%|โโโ | 24/99 [00:54<02:29, 1.99s/it]
Epoch 0, Loss 1.039, LR 1.00e-05: 25%|โโโ | 25/99 [00:56<02:27, 1.99s/it]
Epoch 0, Loss 0.800, LR 1.00e-05: 25%|โโโ | 25/99 [00:56<02:27, 1.99s/it]
Epoch 0, Loss 0.800, LR 1.00e-05: 26%|โโโ | 26/99 [00:58<02:24, 1.98s/it]
Epoch 0, Loss 0.946, LR 1.00e-05: 26%|โโโ | 26/99 [00:58<02:24, 1.98s/it]
Epoch 0, Loss 0.946, LR 1.00e-05: 27%|โโโ | 27/99 [01:00<02:23, 1.99s/it]
Epoch 0, Loss 1.006, LR 1.00e-05: 27%|โโโ | 27/99 [01:00<02:23, 1.99s/it]
Epoch 0, Loss 1.006, LR 1.00e-05: 28%|โโโ | 28/99 [01:01<02:20, 1.98s/it]
Epoch 0, Loss 0.677, LR 1.00e-05: 28%|โโโ | 28/99 [01:01<02:20, 1.98s/it]
Epoch 0, Loss 0.677, LR 1.00e-05: 29%|โโโ | 29/99 [01:04<02:22, 2.04s/it]
Epoch 0, Loss 1.013, LR 1.00e-05: 29%|โโโ | 29/99 [01:04<02:22, 2.04s/it]
Epoch 0, Loss 1.013, LR 1.00e-05: 30%|โโโ | 30/99 [01:06<02:18, 2.01s/it]
Epoch 0, Loss 0.918, LR 1.00e-05: 30%|โโโ | 30/99 [01:06<02:18, 2.01s/it]
Epoch 0, Loss 0.918, LR 1.00e-05: 31%|โโโโ | 31/99 [01:08<02:16, 2.01s/it]
Epoch 0, Loss 0.839, LR 1.00e-05: 31%|โโโโ | 31/99 [01:08<02:16, 2.01s/it]
Epoch 0, Loss 0.839, LR 1.00e-05: 32%|โโโโ | 32/99 [01:10<02:15, 2.02s/it]
Epoch 0, Loss 1.119, LR 1.00e-05: 32%|โโโโ | 32/99 [01:10<02:15, 2.02s/it]
Epoch 0, Loss 1.119, LR 1.00e-05: 33%|โโโโ | 33/99 [01:12<02:13, 2.02s/it]
Epoch 0, Loss 0.769, LR 1.00e-05: 33%|โโโโ | 33/99 [01:12<02:13, 2.02s/it]
Epoch 1, Loss 0.769, LR 1.00e-05: 33%|โโโโ | 33/99 [01:12<02:13, 2.02s/it]
Epoch 1, Loss 0.769, LR 1.00e-05: 34%|โโโโ | 34/99 [01:14<02:12, 2.03s/it]
Epoch 1, Loss 0.613, LR 1.00e-05: 34%|โโโโ | 34/99 [01:14<02:12, 2.03s/it]
Epoch 1, Loss 0.613, LR 1.00e-05: 35%|โโโโ | 35/99 [01:16<02:09, 2.02s/it]
Epoch 1, Loss 0.661, LR 1.00e-05: 35%|โโโโ | 35/99 [01:16<02:09, 2.02s/it]
Epoch 1, Loss 0.661, LR 1.00e-05: 36%|โโโโ | 36/99 [01:18<02:13, 2.13s/it]
Epoch 1, Loss 0.629, LR 1.00e-05: 36%|โโโโ | 36/99 [01:18<02:13, 2.13s/it]
Epoch 1, Loss 0.629, LR 1.00e-05: 37%|โโโโ | 37/99 [01:20<02:10, 2.10s/it]
Epoch 1, Loss 0.638, LR 1.00e-05: 37%|โโโโ | 37/99 [01:20<02:10, 2.10s/it]
Epoch 1, Loss 0.638, LR 1.00e-05: 38%|โโโโ | 38/99 [01:22<02:05, 2.05s/it]
Epoch 1, Loss 0.596, LR 1.00e-05: 38%|โโโโ | 38/99 [01:22<02:05, 2.05s/it]
Epoch 1, Loss 0.596, LR 1.00e-05: 39%|โโโโ | 39/99 [01:24<02:01, 2.02s/it]
Epoch 1, Loss 0.618, LR 1.00e-05: 39%|โโโโ | 39/99 [01:24<02:01, 2.02s/it]
Epoch 1, Loss 0.618, LR 1.00e-05: 40%|โโโโ | 40/99 [01:26<01:58, 2.01s/it]
Epoch 1, Loss 0.539, LR 1.00e-05: 40%|โโโโ | 40/99 [01:26<01:58, 2.01s/it]
Epoch 1, Loss 0.539, LR 1.00e-05: 41%|โโโโโ | 41/99 [01:28<01:56, 2.02s/it]
Epoch 1, Loss 0.418, LR 1.00e-05: 41%|โโโโโ | 41/99 [01:28<01:56, 2.02s/it]
Epoch 1, Loss 0.418, LR 1.00e-05: 42%|โโโโโ | 42/99 [01:30<01:54, 2.01s/it]
Epoch 1, Loss 0.477, LR 1.00e-05: 42%|โโโโโ | 42/99 [01:30<01:54, 2.01s/it]
Epoch 1, Loss 0.477, LR 1.00e-05: 43%|โโโโโ | 43/99 [01:32<01:56, 2.09s/it]
Epoch 1, Loss 0.350, LR 1.00e-05: 43%|โโโโโ | 43/99 [01:32<01:56, 2.09s/it]
Epoch 1, Loss 0.350, LR 1.00e-05: 44%|โโโโโ | 44/99 [01:34<01:52, 2.04s/it]
Epoch 1, Loss 0.469, LR 1.00e-05: 44%|โโโโโ | 44/99 [01:34<01:52, 2.04s/it]
Epoch 1, Loss 0.469, LR 1.00e-05: 45%|โโโโโ | 45/99 [01:36<01:49, 2.02s/it]
Epoch 1, Loss 0.488, LR 1.00e-05: 45%|โโโโโ | 45/99 [01:36<01:49, 2.02s/it]
Epoch 1, Loss 0.488, LR 1.00e-05: 46%|โโโโโ | 46/99 [01:38<01:45, 1.99s/it]
Epoch 1, Loss 0.434, LR 1.00e-05: 46%|โโโโโ | 46/99 [01:38<01:45, 1.99s/it]
Epoch 1, Loss 0.434, LR 1.00e-05: 47%|โโโโโ | 47/99 [01:40<01:42, 1.98s/it]
Epoch 1, Loss 0.382, LR 1.00e-05: 47%|โโโโโ | 47/99 [01:40<01:42, 1.98s/it]
Epoch 1, Loss 0.382, LR 1.00e-05: 48%|โโโโโ | 48/99 [01:42<01:39, 1.95s/it]
Epoch 1, Loss 0.577, LR 1.00e-05: 48%|โโโโโ | 48/99 [01:42<01:39, 1.95s/it]
Epoch 1, Loss 0.577, LR 1.00e-05: 49%|โโโโโ | 49/99 [01:44<01:36, 1.93s/it]
Epoch 1, Loss 0.414, LR 1.00e-05: 49%|โโโโโ | 49/99 [01:44<01:36, 1.93s/it]
Epoch 1, Loss 0.414, LR 1.00e-05: 51%|โโโโโ | 50/99 [01:46<01:38, 2.01s/it]
Epoch 1, Loss 0.575, LR 1.00e-05: 51%|โโโโโ | 50/99 [01:46<01:38, 2.01s/it]
Epoch 1, Loss 0.575, LR 1.00e-05: 52%|โโโโโโ | 51/99 [01:48<01:34, 1.97s/it]
Epoch 1, Loss 0.422, LR 1.00e-05: 52%|โโโโโโ | 51/99 [01:48<01:34, 1.97s/it]
Epoch 1, Loss 0.422, LR 1.00e-05: 53%|โโโโโโ | 52/99 [01:50<01:31, 1.94s/it]
Epoch 1, Loss 0.690, LR 1.00e-05: 53%|โโโโโโ | 52/99 [01:50<01:31, 1.94s/it]
Epoch 1, Loss 0.690, LR 1.00e-05: 54%|โโโโโโ | 53/99 [01:52<01:28, 1.91s/it]
Epoch 1, Loss 0.683, LR 1.00e-05: 54%|โโโโโโ | 53/99 [01:52<01:28, 1.91s/it]
Epoch 1, Loss 0.683, LR 1.00e-05: 55%|โโโโโโ | 54/99 [01:54<01:25, 1.91s/it]
Epoch 1, Loss 0.494, LR 1.00e-05: 55%|โโโโโโ | 54/99 [01:54<01:25, 1.91s/it]
Epoch 1, Loss 0.494, LR 1.00e-05: 56%|โโโโโโ | 55/99 [01:55<01:23, 1.90s/it]
Epoch 1, Loss 0.613, LR 1.00e-05: 56%|โโโโโโ | 55/99 [01:55<01:23, 1.90s/it]
Epoch 1, Loss 0.613, LR 1.00e-05: 57%|โโโโโโ | 56/99 [01:57<01:21, 1.90s/it]
Epoch 1, Loss 0.723, LR 1.00e-05: 57%|โโโโโโ | 56/99 [01:57<01:21, 1.90s/it]
Epoch 1, Loss 0.723, LR 1.00e-05: 58%|โโโโโโ | 57/99 [01:59<01:22, 1.97s/it]
Epoch 1, Loss 0.645, LR 1.00e-05: 58%|โโโโโโ | 57/99 [01:59<01:22, 1.97s/it]
Epoch 1, Loss 0.645, LR 1.00e-05: 59%|โโโโโโ | 58/99 [02:01<01:20, 1.96s/it]
Epoch 1, Loss 0.494, LR 1.00e-05: 59%|โโโโโโ | 58/99 [02:01<01:20, 1.96s/it]
Epoch 1, Loss 0.494, LR 1.00e-05: 60%|โโโโโโ | 59/99 [02:03<01:17, 1.94s/it]
Epoch 1, Loss 0.573, LR 1.00e-05: 60%|โโโโโโ | 59/99 [02:03<01:17, 1.94s/it]
Epoch 1, Loss 0.573, LR 1.00e-05: 61%|โโโโโโ | 60/99 [02:05<01:15, 1.94s/it]
Epoch 1, Loss 0.595, LR 1.00e-05: 61%|โโโโโโ | 60/99 [02:05<01:15, 1.94s/it]
Epoch 1, Loss 0.595, LR 1.00e-05: 62%|โโโโโโโ | 61/99 [02:07<01:13, 1.94s/it]
Epoch 1, Loss 0.381, LR 1.00e-05: 62%|โโโโโโโ | 61/99 [02:07<01:13, 1.94s/it]
Epoch 1, Loss 0.381, LR 1.00e-05: 63%|โโโโโโโ | 62/99 [02:09<01:11, 1.93s/it]
Epoch 1, Loss 0.641, LR 1.00e-05: 63%|โโโโโโโ | 62/99 [02:09<01:11, 1.93s/it]
Epoch 1, Loss 0.641, LR 1.00e-05: 64%|โโโโโโโ | 63/99 [02:11<01:10, 1.95s/it]
Epoch 1, Loss 0.548, LR 1.00e-05: 64%|โโโโโโโ | 63/99 [02:11<01:10, 1.95s/it]
Epoch 1, Loss 0.548, LR 1.00e-05: 65%|โโโโโโโ | 64/99 [02:13<01:11, 2.04s/it]
Epoch 1, Loss 0.494, LR 1.00e-05: 65%|โโโโโโโ | 64/99 [02:13<01:11, 2.04s/it]
Epoch 1, Loss 0.494, LR 1.00e-05: 66%|โโโโโโโ | 65/99 [02:15<01:07, 1.98s/it]
Epoch 1, Loss 0.712, LR 1.00e-05: 66%|โโโโโโโ | 65/99 [02:15<01:07, 1.98s/it]
Epoch 1, Loss 0.712, LR 1.00e-05: 67%|โโโโโโโ | 66/99 [02:17<01:04, 1.95s/it]
Epoch 1, Loss 0.335, LR 1.00e-05: 67%|โโโโโโโ | 66/99 [02:17<01:04, 1.95s/it]
Epoch 2, Loss 0.335, LR 1.00e-05: 67%|โโโโโโโ | 66/99 [02:17<01:04, 1.95s/it]
Epoch 2, Loss 0.335, LR 1.00e-05: 68%|โโโโโโโ | 67/99 [02:19<01:01, 1.93s/it]
Epoch 2, Loss 0.411, LR 1.00e-05: 68%|โโโโโโโ | 67/99 [02:19<01:01, 1.93s/it]
Epoch 2, Loss 0.411, LR 1.00e-05: 69%|โโโโโโโ | 68/99 [02:21<01:00, 1.96s/it]
Epoch 2, Loss 0.455, LR 1.00e-05: 69%|โโโโโโโ | 68/99 [02:21<01:00, 1.96s/it]
Epoch 2, Loss 0.455, LR 1.00e-05: 70%|โโโโโโโ | 69/99 [02:23<00:57, 1.93s/it]
Epoch 2, Loss 0.369, LR 1.00e-05: 70%|โโโโโโโ | 69/99 [02:23<00:57, 1.93s/it]
Epoch 2, Loss 0.369, LR 1.00e-05: 71%|โโโโโโโ | 70/99 [02:25<00:55, 1.91s/it]
Epoch 2, Loss 0.384, LR 1.00e-05: 71%|โโโโโโโ | 70/99 [02:25<00:55, 1.91s/it]
Epoch 2, Loss 0.384, LR 1.00e-05: 72%|โโโโโโโโ | 71/99 [02:27<00:55, 1.99s/it]
Epoch 2, Loss 0.370, LR 1.00e-05: 72%|โโโโโโโโ | 71/99 [02:27<00:55, 1.99s/it]
Epoch 2, Loss 0.370, LR 1.00e-05: 73%|โโโโโโโโ | 72/99 [02:29<00:53, 1.97s/it]
Epoch 2, Loss 0.396, LR 1.00e-05: 73%|โโโโโโโโ | 72/99 [02:29<00:53, 1.97s/it]
Epoch 2, Loss 0.396, LR 1.00e-05: 74%|โโโโโโโโ | 73/99 [02:31<00:50, 1.94s/it]
Epoch 2, Loss 0.337, LR 1.00e-05: 74%|โโโโโโโโ | 73/99 [02:31<00:50, 1.94s/it]
Epoch 2, Loss 0.337, LR 1.00e-05: 75%|โโโโโโโโ | 74/99 [02:33<00:48, 1.92s/it]
Epoch 2, Loss 0.206, LR 1.00e-05: 75%|โโโโโโโโ | 74/99 [02:33<00:48, 1.92s/it]
Epoch 2, Loss 0.206, LR 1.00e-05: 76%|โโโโโโโโ | 75/99 [02:34<00:45, 1.91s/it]
Epoch 2, Loss 0.247, LR 1.00e-05: 76%|โโโโโโโโ | 75/99 [02:34<00:45, 1.91s/it]
Epoch 2, Loss 0.247, LR 1.00e-05: 77%|โโโโโโโโ | 76/99 [02:36<00:43, 1.91s/it]
Epoch 2, Loss 0.183, LR 1.00e-05: 77%|โโโโโโโโ | 76/99 [02:36<00:43, 1.91s/it]
Epoch 2, Loss 0.183, LR 1.00e-05: 78%|โโโโโโโโ | 77/99 [02:38<00:41, 1.90s/it]
Epoch 2, Loss 0.236, LR 1.00e-05: 78%|โโโโโโโโ | 77/99 [02:38<00:41, 1.90s/it]
Epoch 2, Loss 0.236, LR 1.00e-05: 79%|โโโโโโโโ | 78/99 [02:40<00:42, 2.01s/it]
Epoch 2, Loss 0.220, LR 1.00e-05: 79%|โโโโโโโโ | 78/99 [02:40<00:42, 2.01s/it]
Epoch 2, Loss 0.220, LR 1.00e-05: 80%|โโโโโโโโ | 79/99 [02:42<00:40, 2.01s/it]
Epoch 2, Loss 0.186, LR 1.00e-05: 80%|โโโโโโโโ | 79/99 [02:42<00:40, 2.01s/it]
Epoch 2, Loss 0.186, LR 1.00e-05: 81%|โโโโโโโโ | 80/99 [02:44<00:37, 1.99s/it]
Epoch 2, Loss 0.251, LR 1.00e-05: 81%|โโโโโโโโ | 80/99 [02:44<00:37, 1.99s/it]
Epoch 2, Loss 0.251, LR 1.00e-05: 82%|โโโโโโโโโ | 81/99 [02:46<00:35, 1.95s/it]
Epoch 2, Loss 0.315, LR 1.00e-05: 82%|โโโโโโโโโ | 81/99 [02:46<00:35, 1.95s/it]
Epoch 2, Loss 0.315, LR 1.00e-05: 83%|โโโโโโโโโ | 82/99 [02:48<00:33, 1.94s/it]
Epoch 2, Loss 0.147, LR 1.00e-05: 83%|โโโโโโโโโ | 82/99 [02:48<00:33, 1.94s/it]
Epoch 2, Loss 0.147, LR 1.00e-05: 84%|โโโโโโโโโ | 83/99 [02:50<00:31, 1.94s/it]
Epoch 2, Loss 0.208, LR 1.00e-05: 84%|โโโโโโโโโ | 83/99 [02:50<00:31, 1.94s/it]
Epoch 2, Loss 0.208, LR 1.00e-05: 85%|โโโโโโโโโ | 84/99 [02:52<00:29, 1.94s/it]
Epoch 2, Loss 0.187, LR 1.00e-05: 85%|โโโโโโโโโ | 84/99 [02:52<00:29, 1.94s/it]
Epoch 2, Loss 0.187, LR 1.00e-05: 86%|โโโโโโโโโ | 85/99 [02:54<00:28, 2.03s/it]
Epoch 2, Loss 0.394, LR 1.00e-05: 86%|โโโโโโโโโ | 85/99 [02:54<00:28, 2.03s/it]
Epoch 2, Loss 0.394, LR 1.00e-05: 87%|โโโโโโโโโ | 86/99 [02:56<00:25, 1.99s/it]
Epoch 2, Loss 0.326, LR 1.00e-05: 87%|โโโโโโโโโ | 86/99 [02:56<00:25, 1.99s/it]
Epoch 2, Loss 0.326, LR 1.00e-05: 88%|โโโโโโโโโ | 87/99 [02:58<00:23, 1.97s/it]
Epoch 2, Loss 0.227, LR 1.00e-05: 88%|โโโโโโโโโ | 87/99 [02:58<00:23, 1.97s/it]
Epoch 2, Loss 0.227, LR 1.00e-05: 89%|โโโโโโโโโ | 88/99 [03:00<00:21, 1.96s/it]
Epoch 2, Loss 0.256, LR 1.00e-05: 89%|โโโโโโโโโ | 88/99 [03:00<00:21, 1.96s/it]
Epoch 2, Loss 0.256, LR 1.00e-05: 90%|โโโโโโโโโ | 89/99 [03:02<00:19, 1.96s/it]
Epoch 2, Loss 0.334, LR 1.00e-05: 90%|โโโโโโโโโ | 89/99 [03:02<00:19, 1.96s/it]
Epoch 2, Loss 0.334, LR 1.00e-05: 91%|โโโโโโโโโ | 90/99 [03:04<00:17, 1.94s/it]
Epoch 2, Loss 0.330, LR 1.00e-05: 91%|โโโโโโโโโ | 90/99 [03:04<00:17, 1.94s/it]
Epoch 2, Loss 0.330, LR 1.00e-05: 92%|โโโโโโโโโโ| 91/99 [03:06<00:15, 1.94s/it]
Epoch 2, Loss 0.259, LR 1.00e-05: 92%|โโโโโโโโโโ| 91/99 [03:06<00:15, 1.94s/it]
Epoch 2, Loss 0.259, LR 1.00e-05: 93%|โโโโโโโโโโ| 92/99 [03:08<00:14, 2.02s/it]
Epoch 2, Loss 0.283, LR 1.00e-05: 93%|โโโโโโโโโโ| 92/99 [03:08<00:14, 2.02s/it]
Epoch 2, Loss 0.283, LR 1.00e-05: 94%|โโโโโโโโโโ| 93/99 [03:11<00:13, 2.17s/it]
Epoch 2, Loss 0.271, LR 1.00e-05: 94%|โโโโโโโโโโ| 93/99 [03:11<00:13, 2.17s/it]
Epoch 2, Loss 0.271, LR 1.00e-05: 95%|โโโโโโโโโโ| 94/99 [03:21<00:23, 4.62s/it]
Epoch 2, Loss 0.180, LR 1.00e-05: 95%|โโโโโโโโโโ| 94/99 [03:21<00:23, 4.62s/it]
Epoch 2, Loss 0.180, LR 1.00e-05: 96%|โโโโโโโโโโ| 95/99 [03:23<00:15, 3.83s/it]
Epoch 2, Loss 0.325, LR 1.00e-05: 96%|โโโโโโโโโโ| 95/99 [03:23<00:15, 3.83s/it]
Epoch 2, Loss 0.325, LR 1.00e-05: 97%|โโโโโโโโโโ| 96/99 [03:25<00:09, 3.27s/it]
Epoch 2, Loss 0.280, LR 1.00e-05: 97%|โโโโโโโโโโ| 96/99 [03:25<00:09, 3.27s/it]
Epoch 2, Loss 0.280, LR 1.00e-05: 98%|โโโโโโโโโโ| 97/99 [03:27<00:05, 2.90s/it]
Epoch 2, Loss 0.228, LR 1.00e-05: 98%|โโโโโโโโโโ| 97/99 [03:27<00:05, 2.90s/it]
Epoch 2, Loss 0.228, LR 1.00e-05: 99%|โโโโโโโโโโ| 98/99 [03:29<00:02, 2.65s/it]
Epoch 2, Loss 0.380, LR 1.00e-05: 99%|โโโโโโโโโโ| 98/99 [03:29<00:02, 2.65s/it]
Epoch 2, Loss 0.380, LR 1.00e-05: 100%|โโโโโโโโโโ| 99/99 [03:31<00:00, 2.56s/it]
Epoch 2, Loss 0.106, LR 1.00e-05: 100%|โโโโโโโโโโ| 99/99 [03:31<00:00, 2.56s/it]/usr/local/lib/python3.11/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:680: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
warnings.warn(
/usr/local/lib/python3.11/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:680: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
warnings.warn(
Using BNB DORA 1
Epoch 2, Loss 0.106, LR 1.00e-05: 100%|โโโโโโโโโโ| 99/99 [03:33<00:00, 2.16s/it]
Finished training 0
CUDA event elapsed time: 211.839921875 sec
time_taken: 211.839921875
Rank 0: Before forward: 1.62 GiB
Rank 0: After forward: 2.46 GiB
Rank 0: After backward: 2.64 GiB
Rank 0: Peak allocated memory: 1.41 GiB
Rank 0: Peak reserved memory: 2.65 GiB
Saving trained LoRA weights.
Done 0
Training completed: 0
๐ Model saved successfully!
๐ Saved files:
๐ model_state_dict.safetensors