ekurtic commited on
Commit
ba1c53f
·
verified ·
1 Parent(s): ef3a876

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +12 -0
README.md CHANGED
@@ -36,6 +36,18 @@ This is a speculator model designed for use with [meta-llama/Llama-3.3-70B-Instr
36
  It was trained using the [speculators](https://github.com/vllm-project/speculators) library on a combination of the [Aeala/ShareGPT_Vicuna_unfiltered](https://huggingface.co/datasets/Aeala/ShareGPT_Vicuna_unfiltered) and the `train_sft` split of [HuggingFaceH4/ultrachat_200k](https://huggingface.co/datasets/HuggingFaceH4/ultrachat_200k) datasets.
37
  This model should be used with the [meta-llama/Llama-3.3-70B-Instruct](https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct) chat template, specifically through the `/chat/completions` endpoint.
38
 
 
 
 
 
 
 
 
 
 
 
 
 
39
  ## Evaluations
40
 
41
  Subset of GSM8k (math reasoning):
 
36
  It was trained using the [speculators](https://github.com/vllm-project/speculators) library on a combination of the [Aeala/ShareGPT_Vicuna_unfiltered](https://huggingface.co/datasets/Aeala/ShareGPT_Vicuna_unfiltered) and the `train_sft` split of [HuggingFaceH4/ultrachat_200k](https://huggingface.co/datasets/HuggingFaceH4/ultrachat_200k) datasets.
37
  This model should be used with the [meta-llama/Llama-3.3-70B-Instruct](https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct) chat template, specifically through the `/chat/completions` endpoint.
38
 
39
+ ## Use with vLLM
40
+
41
+ ```bash
42
+ vllm serve meta-llama/Llama-3.3-70B-Instruct \
43
+ -tp 2 \
44
+ --speculative-config '{
45
+ "model": "RedHatAI/Llama-3.3-70B-Instruct-speculator.eagle3",
46
+ "num_speculative_tokens": 3,
47
+ "method": "eagle3"
48
+ }'
49
+ ```
50
+
51
  ## Evaluations
52
 
53
  Subset of GSM8k (math reasoning):