Kessbitz commited on
Commit
aa4825b
·
verified ·
1 Parent(s): 55f803d

Updated tasks

Browse files
Files changed (1) hide show
  1. README.md +12 -11
README.md CHANGED
@@ -142,7 +142,8 @@ For the Finetuning task, we both filter and sample down to a maximum 10 000 trai
142
  - **EWoK**: Works similarly to BLiMP but looks the model's internal world knowledge. Looking at both whter a model has physical and social knowledge. (Ivanova et al., 2024)
143
  - **Eye Tracking and Self-paced Reading**: Looks at whether the model can mimick the eye tracking and reading time of a human but using surprisal of a word as a proxy for time spent reading a word. (de Varda et al., BRM 2024)
144
  - **Entity Tracking**: Checks whether a model can keep track of the changes to the states of entities as text/dialogue unfolds. (Kim & Schuster, ACL 2023)
145
- - **WUGs**: Tests morphological generalization in LMs through an adjective nominalization task. (Hofmann et al., 2024)
 
146
 
147
  *Finetuning Tasks*
148
 
@@ -175,16 +176,16 @@ The metrics were chosen based on the advice of the papers the tasks come from.
175
 
176
  **TODO: UPDATE**
177
 
178
- | Hyperparameter | MNLI, RTE, QQP, MRPC | BoolQ, MultiRC | WSC |
179
  | --- | --- | --- | --- |
180
- | Learning Rate | 3\*10<sup>-5</sup> | 3\*10<sup>-5</sup> | 3\*10<sup>-5</sup> |
181
- | Batch Size | 32 | 16 | 32 |
182
- | Epochs | 10 | 10 | 30 |
183
- | Weight decay | 0.01 | 0.01 | 0.01 |
184
- | Optimizer | AdamW | AdamW | AdamW |
185
- | Scheduler | cosine | cosine | cosine |
186
- | Warmup percentage | 6% | 6% | 6% |
187
- | Dropout | 0.1 | 0.1 | 0.1 |
188
 
189
  ## Results
190
 
@@ -231,7 +232,7 @@ The model took 2.5 hours to train and consumed 755 core hours (with 4 GPUs and 3
231
  # Citation
232
 
233
  ```latex
234
- @misc{charpentier2025babylmturns3papers,
235
  title={ToDo},
236
  author={Jonas Mayer Martins, Ali Hamza Bashir, Muhammad Rehan Khalid, Lisa Beinborn},
237
  year={2025},
 
142
  - **EWoK**: Works similarly to BLiMP but looks the model's internal world knowledge. Looking at both whter a model has physical and social knowledge. (Ivanova et al., 2024)
143
  - **Eye Tracking and Self-paced Reading**: Looks at whether the model can mimick the eye tracking and reading time of a human but using surprisal of a word as a proxy for time spent reading a word. (de Varda et al., BRM 2024)
144
  - **Entity Tracking**: Checks whether a model can keep track of the changes to the states of entities as text/dialogue unfolds. (Kim & Schuster, ACL 2023)
145
+ - **WUGs**: Tests morphological generalization in LMs through an adjective nominalization and past tense task. (Hofmann et al., 2024) (Weissweiler et al., 2023)
146
+ - **COMPS**: Property knowledge. (Misra et al., 2023)
147
 
148
  *Finetuning Tasks*
149
 
 
176
 
177
  **TODO: UPDATE**
178
 
179
+ | Hyperparameter | MNLI, RTE, QQP, MRPC, BoolQ, MultiRC | WSC |
180
  | --- | --- | --- | --- |
181
+ | Learning Rate | 3\*10<sup>-5</sup> | 3\*10<sup>-5</sup> |
182
+ | Batch Size | 16 | 16 |
183
+ | Epochs | 10 | 30 |
184
+ | Weight decay | 0.01 | 0.01 |
185
+ | Optimizer | AdamW | AdamW |
186
+ | Scheduler | cosine | cosine |
187
+ | Warmup percentage | 6% | 6% |
188
+ | Dropout | 0.1 | 0.1 |
189
 
190
  ## Results
191
 
 
232
  # Citation
233
 
234
  ```latex
235
+ @misc{MayerMartinsBKB2025,
236
  title={ToDo},
237
  author={Jonas Mayer Martins, Ali Hamza Bashir, Muhammad Rehan Khalid, Lisa Beinborn},
238
  year={2025},