Update README.md
Browse files
README.md
CHANGED
|
@@ -36,6 +36,18 @@ This is a speculator model designed for use with [meta-llama/Llama-3.3-70B-Instr
|
|
| 36 |
It was trained using the [speculators](https://github.com/vllm-project/speculators) library on a combination of the [Aeala/ShareGPT_Vicuna_unfiltered](https://huggingface.co/datasets/Aeala/ShareGPT_Vicuna_unfiltered) and the `train_sft` split of [HuggingFaceH4/ultrachat_200k](https://huggingface.co/datasets/HuggingFaceH4/ultrachat_200k) datasets.
|
| 37 |
This model should be used with the [meta-llama/Llama-3.3-70B-Instruct](https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct) chat template, specifically through the `/chat/completions` endpoint.
|
| 38 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 39 |
## Evaluations
|
| 40 |
|
| 41 |
Subset of GSM8k (math reasoning):
|
|
|
|
| 36 |
It was trained using the [speculators](https://github.com/vllm-project/speculators) library on a combination of the [Aeala/ShareGPT_Vicuna_unfiltered](https://huggingface.co/datasets/Aeala/ShareGPT_Vicuna_unfiltered) and the `train_sft` split of [HuggingFaceH4/ultrachat_200k](https://huggingface.co/datasets/HuggingFaceH4/ultrachat_200k) datasets.
|
| 37 |
This model should be used with the [meta-llama/Llama-3.3-70B-Instruct](https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct) chat template, specifically through the `/chat/completions` endpoint.
|
| 38 |
|
| 39 |
+
## Use with vLLM
|
| 40 |
+
|
| 41 |
+
```bash
|
| 42 |
+
vllm serve meta-llama/Llama-3.3-70B-Instruct \
|
| 43 |
+
-tp 2 \
|
| 44 |
+
--speculative-config '{
|
| 45 |
+
"model": "RedHatAI/Llama-3.3-70B-Instruct-speculator.eagle3",
|
| 46 |
+
"num_speculative_tokens": 3,
|
| 47 |
+
"method": "eagle3"
|
| 48 |
+
}'
|
| 49 |
+
```
|
| 50 |
+
|
| 51 |
## Evaluations
|
| 52 |
|
| 53 |
Subset of GSM8k (math reasoning):
|