alexmarques commited on
Commit
e7dff4c
·
verified ·
1 Parent(s): 66c60db

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +116 -7
README.md CHANGED
@@ -40,7 +40,7 @@ This model should be used with the [meta-llama/Llama-3.3-70B-Instruct](https://h
40
 
41
  ```bash
42
  vllm serve meta-llama/Llama-3.3-70B-Instruct \
43
- -tp 2 \
44
  --speculative-config '{
45
  "model": "RedHatAI/Llama-3.3-70B-Instruct-speculator.eagle3",
46
  "num_speculative_tokens": 3,
@@ -50,10 +50,119 @@ vllm serve meta-llama/Llama-3.3-70B-Instruct \
50
 
51
  ## Evaluations
52
 
53
- Subset of GSM8k (math reasoning):
54
- * acceptance_rate = [80.1, 63.7, 46.4]
55
- * conditional_acceptance_rate = [80.1, 79.5, 72.9]
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
56
 
57
- Subset of MTBench:
58
- * acceptance_rate = [73.3, 53.7, 38.4]
59
- * conditional_acceptance_rate = [73.3, 73.3, 71.5]
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
40
 
41
  ```bash
42
  vllm serve meta-llama/Llama-3.3-70B-Instruct \
43
+ -tp 4 \
44
  --speculative-config '{
45
  "model": "RedHatAI/Llama-3.3-70B-Instruct-speculator.eagle3",
46
  "num_speculative_tokens": 3,
 
50
 
51
  ## Evaluations
52
 
53
+ <h3>Use cases</h3>
54
+ <table>
55
+ <thead>
56
+ <tr>
57
+ <th>Use Case</th>
58
+ <th>Dataset</th>
59
+ <th>Number of Samples</th>
60
+ </tr>
61
+ </thead>
62
+ <tbody>
63
+ <tr>
64
+ <td>Coding</td>
65
+ <td>HumanEval</td>
66
+ <td>168</td>
67
+ </tr>
68
+ <tr>
69
+ <td>Math Reasoning</td>
70
+ <td>gsm8k</td>
71
+ <td>80</td>
72
+ </tr>
73
+ <tr>
74
+ <td>Text Summarization</td>
75
+ <td>CNN/Daily Mail</td>
76
+ <td>80</td>
77
+ </tr>
78
+ </tbody>
79
+ </table>
80
 
81
+ <h3>Acceptance lengths</h3>
82
+ <table>
83
+ <thead>
84
+ <tr>
85
+ <th>Use Case</th>
86
+ <th>k=1</th>
87
+ <th>k=2</th>
88
+ <th>k=3</th>
89
+ <th>k=4</th>
90
+ <th>k=5</th>
91
+ <th>k=6</th>
92
+ <th>k=7</th>
93
+ </tr>
94
+ </thead>
95
+ <tbody>
96
+ <tr>
97
+ <td>Coding</td>
98
+ <td></td>
99
+ <td></td>
100
+ <td></td>
101
+ <td></td>
102
+ <td></td>
103
+ <td></td>
104
+ <td></td>
105
+ </tr>
106
+ <tr>
107
+ <td>Math Reasoning</td>
108
+ <td>1.80</td>
109
+ <td>2.44</td>
110
+ <td>2.89</td>
111
+ <td>3.15</td>
112
+ <td>3.33</td>
113
+ <td>3.44</td>
114
+ <td>3.52</td>
115
+ </tr>
116
+ <tr>
117
+ <td>Text Summarization</td>
118
+ <td>1.72</td>
119
+ <td>2.21</td>
120
+ <td>2.53</td>
121
+ <td>2.74</td>
122
+ <td>2.86</td>
123
+ <td>2.93</td>
124
+ <td>2.98</td>
125
+ </tr>
126
+ </tbody>
127
+ </table>
128
+
129
+ <h3>Performance benchmarking (4xA100)</h3>
130
+ <div style="display: flex; justify-content: center; gap: 20px;">
131
+
132
+ <figure style="text-align: center;">
133
+ <img src="assets/Llama-3.3-70B-Instruct-HumanEval.png" alt="Coding" width="100%">
134
+ <figcaption><b>(a)</b> Acceptance lengths — Coding</figcaption>
135
+ </figure>
136
+
137
+ <figure style="text-align: center;">
138
+ <img src="assets/Llama-3.3-70B-Instruct-math_reasoning.png" alt="Coding" width="100%">
139
+ <figcaption><b>(b)</b> Acceptance lengths — Math Reasoning</figcaption>
140
+ </figure>
141
+
142
+ <figure style="text-align: center;">
143
+ <img src="assets/Llama-3.3-70B-Instruct-summarization.png" alt="Coding" width="100%">
144
+ <figcaption><b>(b)</b> Acceptance lengths — Math Reasoning</figcaption>
145
+ </figure>
146
+ </div>
147
+
148
+ <details> <summary>Details</summary>
149
+ <strong>Configuration</strong>
150
+
151
+ - temperature: 0
152
+ - repetitions: 5
153
+ - time per experiment: 4min
154
+ - hardware: 4xA100
155
+ - vLLM version: 0.11.0
156
+ - GuideLLM version: 0.3.0
157
+
158
+ <strong>Command</strong>
159
+ ```bash
160
+ GUIDELLM__PREFERRED_ROUTE="chat_completions" \
161
+ guidellm benchmark \
162
+ --target "http://localhost:8000/v1" \
163
+ --data "RedHatAI/SpeculativeDecoding" \
164
+ --rate-type sweep \
165
+ --max-seconds 240 \
166
+ --output-path "Llama-3.3-70B-Instruct-HumanEval.json" \
167
+ --backend-args '{"extra_body": {"chat_completions": {"temperature": 0.0}}}'
168
+ </details>