Update README.md
Browse files
README.md
CHANGED
|
@@ -56,3 +56,38 @@ dpo_trainer = DPOTrainer(
|
|
| 56 |
)
|
| 57 |
```
|
| 58 |
The code and parameters are borrowed from https://colab.research.google.com/drive/1SKrKGV-BZoU4kv5q3g0jtE_OhRgPtrrQ?usp=sharing
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 56 |
)
|
| 57 |
```
|
| 58 |
The code and parameters are borrowed from https://colab.research.google.com/drive/1SKrKGV-BZoU4kv5q3g0jtE_OhRgPtrrQ?usp=sharing
|
| 59 |
+
|
| 60 |
+
|
| 61 |
+
Benchmark Scores
|
| 62 |
+
|
| 63 |
+
| Tasks |Version|Filter|n-shot| Metric |Value | |Stderr|
|
| 64 |
+
|-------------|------:|------|-----:|--------|-----:|---|-----:|
|
| 65 |
+
|arc_challenge| 1|none | 0|acc |0.6894|± |0.0135|
|
| 66 |
+
| | |none | 0|acc_norm|0.6860|± |0.0136|
|
| 67 |
+
|
| 68 |
+
| Tasks |Version|Filter|n-shot| Metric |Value | |Stderr|
|
| 69 |
+
|---------|------:|------|-----:|--------|-----:|---|-----:|
|
| 70 |
+
|hellaswag| 1|none | 0|acc |0.7092|± |0.0045|
|
| 71 |
+
| | |none | 0|acc_norm|0.8736|± |0.0033|
|
| 72 |
+
|
| 73 |
+
| Tasks |Version|Filter|n-shot|Metric|Value | |Stderr|
|
| 74 |
+
|--------------|------:|------|-----:|------|-----:|---|-----:|
|
| 75 |
+
|truthfulqa_mc2| 2|none | 0|acc |0.7126|± | 0.015|
|
| 76 |
+
|
| 77 |
+
| Groups |Version|Filter|n-shot|Metric|Value | |Stderr|
|
| 78 |
+
|------------------|-------|------|-----:|------|-----:|---|-----:|
|
| 79 |
+
|mmlu |N/A |none | 0|acc |0.6225|± |0.1292|
|
| 80 |
+
| - humanities |N/A |none | 0|acc |0.5745|± |0.1286|
|
| 81 |
+
| - other |N/A |none | 0|acc |0.6952|± |0.1095|
|
| 82 |
+
| - social_sciences|N/A |none | 0|acc |0.7280|± |0.0735|
|
| 83 |
+
| - stem |N/A |none | 0|acc |0.5195|± |0.1313|
|
| 84 |
+
|
| 85 |
+
| Tasks |Version|Filter|n-shot|Metric|Value| |Stderr|
|
| 86 |
+
|----------|------:|------|-----:|------|----:|---|-----:|
|
| 87 |
+
|winogrande| 1|none | 0|acc |0.824|± |0.0107|
|
| 88 |
+
|
| 89 |
+
|Tasks|Version| Filter |n-shot| Metric |Value | |Stderr|
|
| 90 |
+
|-----|------:|----------|-----:|-----------|-----:|---|-----:|
|
| 91 |
+
|gsm8k| 2|get-answer| 5|exact_match|0.7263|± |0.0123|
|
| 92 |
+
|
| 93 |
+
Average = 74.08
|