---
library_name: transformers
license: apache-2.0
base_model: Qwen/Qwen3-Coder-30B-A3B-Instruct
tags:
  - eagle
  - eagle3
  - speculative-decoding
  - draft-model
  - sglang
  - qwen3
  - code
language:
  - en
  - zh
pipeline_tag: text-generation
---

# SGLang-EAGLE3-Qwen3-Coder-30B-A3B-Instruct

This is an **EAGLE3 draft model** for speculative decoding with [Qwen/Qwen3-Coder-30B-A3B-Instruct](https://huggingface.co/Qwen/Qwen3-Coder-30B-A3B-Instruct).

## Model Description

EAGLE3 (Efficient Auto-regressive Language model Generation with Learned Embeddings) is a speculative decoding technique that uses a lightweight draft model to predict future tokens, which are then verified by the target model in parallel. This can significantly accelerate inference speed (2-3x) without any loss in output quality.

### Key Features

- **Target Model**: Qwen3-Coder-30B-A3B-Instruct (30B parameters, 3B active)
- **Draft Model Size**: ~350MB (single transformer layer)
- **Training Data**: OpenPromptContainer (OPC) regenerated dataset
- **Training Steps**: 295,000 (Epoch 1)
- **Framework**: Trained with [SpecForge](https://github.com/sgl-project/SpecForge)

### Training Metrics

| Metric | Value |
|--------|-------|
| First Token Accuracy (acc_0) | 88.19% |
| Average Accuracy (7 positions) | 85.19% |
| Training Epochs | 1+ (295k steps) |

## Usage

### With SGLang

```python
import sglang as sgl

# Launch with EAGLE3 speculative decoding
llm = sgl.Engine(
    model_path="Qwen/Qwen3-Coder-30B-A3B-Instruct",
    speculative_algorithm="EAGLE",
    speculative_draft_model_path="sgl-project/SGLang-EAGLE3-Qwen3-Coder-30B-A3B-Instruct",
    speculative_num_steps=5,
    speculative_eagle_topk=8,
    speculative_num_draft_tokens=64,
)

# Generate text
output = llm.generate("Write a Python function to sort a list:")
print(output)
```

### With SGLang Server

```bash
python -m sglang.launch_server \
    --model-path Qwen/Qwen3-Coder-30B-A3B-Instruct \
    --speculative-algorithm EAGLE \
    --speculative-draft-model-path sgl-project/SGLang-EAGLE3-Qwen3-Coder-30B-A3B-Instruct \
    --speculative-num-steps 5 \
    --speculative-eagle-topk 8 \
    --speculative-num-draft-tokens 64 \
    --tp 8
```

## Model Architecture

The EAGLE3 draft model is a lightweight transformer that:
- Shares embeddings with the target model
- Uses a single transformer layer (hidden_size=2048, intermediate_size=12288)
- Predicts multiple future tokens autoregressively
- Uses the target model's hidden states as input

```json
{
  "architectures": ["LlamaForCausalLMEagle3"],
  "hidden_size": 2048,
  "intermediate_size": 12288,
  "num_attention_heads": 32,
  "num_key_value_heads": 4,
  "num_hidden_layers": 1,
  "vocab_size": 151936
}
```

## Training Details

- **Framework**: SpecForge with SGLang backend
- **Hardware**: 4x NVIDIA H200 GPUs (TP=4)
- **Batch Size**: 1 per GPU
- **Learning Rate**: 1e-4 with cosine annealing
- **Max Sequence Length**: 4096
- **Attention Backend**: FlexAttention

## Citation

If you use this model, please cite:

```bibtex
@article{li2024eagle,
  title={EAGLE: Speculative Sampling Requires Rethinking Feature Uncertainty},
  author={Li, Yuhui and Wei, Fangyun and Zhang, Chao and Zhang, Hongyang},
  journal={arXiv preprint arXiv:2401.15077},
  year={2024}
}

@misc{sglang2024,
  title={SGLang: Efficient Execution of Structured Language Model Programs},
  author={Zheng, Lianmin and others},
  year={2024},
  url={https://github.com/sgl-project/sglang}
}
```

## License

This model is released under the Apache 2.0 License, following the base model's license.