Commit
·
0e74c3e
1
Parent(s):
bf302be
Update README.md
Browse files
README.md
CHANGED
|
@@ -77,7 +77,7 @@ The visual embeddings are taken from the CLIP-Vision model and combined with the
|
|
| 77 |
A total length of 128 tokens, including the visual embeddings, is used. The texts are truncated or padded accordingly.
|
| 78 |
|
| 79 |
### Pretraining
|
| 80 |
-
The checkpoint of the model was trained on Google Cloud Engine TPUv3-8 machine (with 335 GB of RAM, 1000 GB of hard drive, 96 CPU cores) **8 v3 TPU cores** for 60k steps with a per device batch size of 64 and a max sequence length of 128. The optimizer used is Adafactor with a learning rate of 1e-4, learning rate warmup for
|
| 81 |
|
| 82 |
We tracked experiments using TensorBoard. Here is the link to the main dashboard: [CLIP Vision BERT CC12M Pre-training Dashboard](https://huggingface.co/flax-community/multilingual-vqa-pt-ckpts/tensorboard)
|
| 83 |
|
|
|
|
| 77 |
A total length of 128 tokens, including the visual embeddings, is used. The texts are truncated or padded accordingly.
|
| 78 |
|
| 79 |
### Pretraining
|
| 80 |
+
The checkpoint of the model was trained on Google Cloud Engine TPUv3-8 machine (with 335 GB of RAM, 1000 GB of hard drive, 96 CPU cores) **8 v3 TPU cores** for 60k steps with a per device batch size of 64 and a max sequence length of 128. The optimizer used is Adafactor with a learning rate of 1e-4, learning rate warmup for 5,000 steps, and linear decay of the learning rate after.
|
| 81 |
|
| 82 |
We tracked experiments using TensorBoard. Here is the link to the main dashboard: [CLIP Vision BERT CC12M Pre-training Dashboard](https://huggingface.co/flax-community/multilingual-vqa-pt-ckpts/tensorboard)
|
| 83 |
|