Update README.md
Browse files
README.md
CHANGED
|
@@ -241,7 +241,7 @@ The following hyperparameters were used during training:
|
|
| 241 |
- distributed_type: multi-GPU
|
| 242 |
- gradient_accumulation_steps: 16
|
| 243 |
- total_train_batch_size: 64
|
| 244 |
-
- optimizer:
|
| 245 |
- lr_scheduler_type: cosine
|
| 246 |
- lr_scheduler_warmup_ratio: 0.03
|
| 247 |
- num_epochs: 2
|
|
|
|
| 241 |
- distributed_type: multi-GPU
|
| 242 |
- gradient_accumulation_steps: 16
|
| 243 |
- total_train_batch_size: 64
|
| 244 |
+
- optimizer: _ADAN_ using lucidrains' `adan-pytorch` with default betas
|
| 245 |
- lr_scheduler_type: cosine
|
| 246 |
- lr_scheduler_warmup_ratio: 0.03
|
| 247 |
- num_epochs: 2
|