max_length -> 100
Browse files- README.md +5 -3
- config.json +1 -1
README.md
CHANGED
|
@@ -11,6 +11,8 @@ license: mit
|
|
| 11 |
|
| 12 |
A large German GPT2.
|
| 13 |
|
|
|
|
|
|
|
| 14 |
See the [GPT2 model card](https://huggingface.co/gpt2) for considerations on limitations and bias. See the [GPT2 documentation](https://huggingface.co/transformers/model_doc/gpt2.html) for details on GPT2.
|
| 15 |
|
| 16 |
## Comparison to [dbmdz/german-gpt2](https://huggingface.co/dbmdz/german-gpt2)
|
|
@@ -61,8 +63,8 @@ print(tokenizer.decode(output))
|
|
| 61 |
|
| 62 |
## Training details
|
| 63 |
|
| 64 |
-
GerPT2 is trained on the entire German data (67GB) from the [CC-100 Corpus](http://data.statmt.org/cc-100/) and weights were initialized from the [English GPT2 model](https://huggingface.co/gpt2-large).
|
| 65 |
-
GerPT2 was trained with:
|
| 66 |
|
| 67 |
- a batch size of 256
|
| 68 |
- using OneCycle learning rate with a maximum of 5e-3
|
|
@@ -71,7 +73,7 @@ GerPT2 was trained with:
|
|
| 71 |
|
| 72 |
Training took roughly 12 days on 8 TPUv3 cores.
|
| 73 |
|
| 74 |
-
To train GerPT2, follow these steps. Scripts are located in the [Github repository](https://github.com/bminixhofer/gerpt2):
|
| 75 |
|
| 76 |
0. Download and unzip training data from http://data.statmt.org/cc-100/.
|
| 77 |
1. Train a tokenizer using `prepare/train_tokenizer.py`. As training data for the tokenizer I used a random subset of 5% of the CC-100 data.
|
|
|
|
| 11 |
|
| 12 |
A large German GPT2.
|
| 13 |
|
| 14 |
+
Also check out [GerPT2](https://huggingface.co/benjamin/gerpt2), a small version of this model.
|
| 15 |
+
|
| 16 |
See the [GPT2 model card](https://huggingface.co/gpt2) for considerations on limitations and bias. See the [GPT2 documentation](https://huggingface.co/transformers/model_doc/gpt2.html) for details on GPT2.
|
| 17 |
|
| 18 |
## Comparison to [dbmdz/german-gpt2](https://huggingface.co/dbmdz/german-gpt2)
|
|
|
|
| 63 |
|
| 64 |
## Training details
|
| 65 |
|
| 66 |
+
GerPT2-large is trained on the entire German data (67GB) from the [CC-100 Corpus](http://data.statmt.org/cc-100/) and weights were initialized from the [English GPT2 model](https://huggingface.co/gpt2-large).
|
| 67 |
+
GerPT2-large was trained with:
|
| 68 |
|
| 69 |
- a batch size of 256
|
| 70 |
- using OneCycle learning rate with a maximum of 5e-3
|
|
|
|
| 73 |
|
| 74 |
Training took roughly 12 days on 8 TPUv3 cores.
|
| 75 |
|
| 76 |
+
To train GerPT2-large, follow these steps. Scripts are located in the [Github repository](https://github.com/bminixhofer/gerpt2):
|
| 77 |
|
| 78 |
0. Download and unzip training data from http://data.statmt.org/cc-100/.
|
| 79 |
1. Train a tokenizer using `prepare/train_tokenizer.py`. As training data for the tokenizer I used a random subset of 5% of the CC-100 data.
|
config.json
CHANGED
|
@@ -32,7 +32,7 @@
|
|
| 32 |
"task_specific_params": {
|
| 33 |
"text-generation": {
|
| 34 |
"do_sample": true,
|
| 35 |
-
"max_length":
|
| 36 |
}
|
| 37 |
},
|
| 38 |
"vocab_size": 50257
|
|
|
|
| 32 |
"task_specific_params": {
|
| 33 |
"text-generation": {
|
| 34 |
"do_sample": true,
|
| 35 |
+
"max_length": 100
|
| 36 |
}
|
| 37 |
},
|
| 38 |
"vocab_size": 50257
|