Updated tasks
Browse files
README.md
CHANGED
|
@@ -142,7 +142,8 @@ For the Finetuning task, we both filter and sample down to a maximum 10 000 trai
|
|
| 142 |
- **EWoK**: Works similarly to BLiMP but looks the model's internal world knowledge. Looking at both whter a model has physical and social knowledge. (Ivanova et al., 2024)
|
| 143 |
- **Eye Tracking and Self-paced Reading**: Looks at whether the model can mimick the eye tracking and reading time of a human but using surprisal of a word as a proxy for time spent reading a word. (de Varda et al., BRM 2024)
|
| 144 |
- **Entity Tracking**: Checks whether a model can keep track of the changes to the states of entities as text/dialogue unfolds. (Kim & Schuster, ACL 2023)
|
| 145 |
-
- **WUGs**: Tests morphological generalization in LMs through an adjective nominalization task. (Hofmann et al., 2024)
|
|
|
|
| 146 |
|
| 147 |
*Finetuning Tasks*
|
| 148 |
|
|
@@ -175,16 +176,16 @@ The metrics were chosen based on the advice of the papers the tasks come from.
|
|
| 175 |
|
| 176 |
**TODO: UPDATE**
|
| 177 |
|
| 178 |
-
| Hyperparameter | MNLI, RTE, QQP, MRPC
|
| 179 |
| --- | --- | --- | --- |
|
| 180 |
-
| Learning Rate | 3\*10<sup>-5</sup> | 3\*10<sup>-5</sup> |
|
| 181 |
-
| Batch Size |
|
| 182 |
-
| Epochs | 10 |
|
| 183 |
-
| Weight decay | 0.01
|
| 184 |
-
| Optimizer | AdamW | AdamW |
|
| 185 |
-
| Scheduler | cosine | cosine |
|
| 186 |
-
| Warmup percentage | 6% | 6% |
|
| 187 |
-
| Dropout | 0.1 | 0.1 |
|
| 188 |
|
| 189 |
## Results
|
| 190 |
|
|
@@ -231,7 +232,7 @@ The model took 2.5 hours to train and consumed 755 core hours (with 4 GPUs and 3
|
|
| 231 |
# Citation
|
| 232 |
|
| 233 |
```latex
|
| 234 |
-
@misc{
|
| 235 |
title={ToDo},
|
| 236 |
author={Jonas Mayer Martins, Ali Hamza Bashir, Muhammad Rehan Khalid, Lisa Beinborn},
|
| 237 |
year={2025},
|
|
|
|
| 142 |
- **EWoK**: Works similarly to BLiMP but looks the model's internal world knowledge. Looking at both whter a model has physical and social knowledge. (Ivanova et al., 2024)
|
| 143 |
- **Eye Tracking and Self-paced Reading**: Looks at whether the model can mimick the eye tracking and reading time of a human but using surprisal of a word as a proxy for time spent reading a word. (de Varda et al., BRM 2024)
|
| 144 |
- **Entity Tracking**: Checks whether a model can keep track of the changes to the states of entities as text/dialogue unfolds. (Kim & Schuster, ACL 2023)
|
| 145 |
+
- **WUGs**: Tests morphological generalization in LMs through an adjective nominalization and past tense task. (Hofmann et al., 2024) (Weissweiler et al., 2023)
|
| 146 |
+
- **COMPS**: Property knowledge. (Misra et al., 2023)
|
| 147 |
|
| 148 |
*Finetuning Tasks*
|
| 149 |
|
|
|
|
| 176 |
|
| 177 |
**TODO: UPDATE**
|
| 178 |
|
| 179 |
+
| Hyperparameter | MNLI, RTE, QQP, MRPC, BoolQ, MultiRC | WSC |
|
| 180 |
| --- | --- | --- | --- |
|
| 181 |
+
| Learning Rate | 3\*10<sup>-5</sup> | 3\*10<sup>-5</sup> |
|
| 182 |
+
| Batch Size | 16 | 16 |
|
| 183 |
+
| Epochs | 10 | 30 |
|
| 184 |
+
| Weight decay | 0.01 | 0.01 |
|
| 185 |
+
| Optimizer | AdamW | AdamW |
|
| 186 |
+
| Scheduler | cosine | cosine |
|
| 187 |
+
| Warmup percentage | 6% | 6% |
|
| 188 |
+
| Dropout | 0.1 | 0.1 |
|
| 189 |
|
| 190 |
## Results
|
| 191 |
|
|
|
|
| 232 |
# Citation
|
| 233 |
|
| 234 |
```latex
|
| 235 |
+
@misc{MayerMartinsBKB2025,
|
| 236 |
title={ToDo},
|
| 237 |
author={Jonas Mayer Martins, Ali Hamza Bashir, Muhammad Rehan Khalid, Lisa Beinborn},
|
| 238 |
year={2025},
|