RedHatAI
/

Llama-3.3-70B-Instruct-speculator.eagle3

Text Generation

Model card Files Files and versions

ekurtic commited on Oct 1

Commit

ba1c53f

·

verified ·

1 Parent(s): ef3a876

Update README.md

Files changed (1) hide show

README.md +12 -0

README.md CHANGED Viewed

@@ -36,6 +36,18 @@ This is a speculator model designed for use with [meta-llama/Llama-3.3-70B-Instr
 It was trained using the [speculators](https://github.com/vllm-project/speculators) library on a combination of the [Aeala/ShareGPT_Vicuna_unfiltered](https://huggingface.co/datasets/Aeala/ShareGPT_Vicuna_unfiltered) and the `train_sft` split of [HuggingFaceH4/ultrachat_200k](https://huggingface.co/datasets/HuggingFaceH4/ultrachat_200k) datasets.
 This model should be used with the [meta-llama/Llama-3.3-70B-Instruct](https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct) chat template, specifically through the `/chat/completions` endpoint.
 ## Evaluations
 Subset of GSM8k (math reasoning):

 It was trained using the [speculators](https://github.com/vllm-project/speculators) library on a combination of the [Aeala/ShareGPT_Vicuna_unfiltered](https://huggingface.co/datasets/Aeala/ShareGPT_Vicuna_unfiltered) and the `train_sft` split of [HuggingFaceH4/ultrachat_200k](https://huggingface.co/datasets/HuggingFaceH4/ultrachat_200k) datasets.
 This model should be used with the [meta-llama/Llama-3.3-70B-Instruct](https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct) chat template, specifically through the `/chat/completions` endpoint.
+## Use with vLLM
+```bash
+vllm serve meta-llama/Llama-3.3-70B-Instruct \
+  -tp 2 \
+  --speculative-config '{
+    "model": "RedHatAI/Llama-3.3-70B-Instruct-speculator.eagle3",
+    "num_speculative_tokens": 3,
+    "method": "eagle3"
+  }'
+```
 ## Evaluations
 Subset of GSM8k (math reasoning):