Qwen2.5-1.5B Math Reasoning (GPT-4o-mini Teacher)

A LoRA adapter for Qwen2.5-1.5B-Instruct trained via imitation learning on chain-of-thought reasoning from GPT-4o-mini. See the full project on GitHub for training code and experimental details.

Try the live demo

Performance

Model Accuracy Relative Improvement
Qwen2.5-1.5B-Instruct (base) 60.96% -
This model 67.17% +10.2%

Evaluated on GSM8K test set (1,319 problems).

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch

base_model = AutoModelForCausalLM.from_pretrained(
    "Qwen/Qwen2.5-1.5B-Instruct",
    torch_dtype=torch.bfloat16,
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-1.5B-Instruct")
model = PeftModel.from_pretrained(base_model, "v-vasi/qwen-math-gpt4o-imitation")

question = "Janet's ducks lay 16 eggs per day. She eats three for breakfast and bakes muffins with four. She sells the rest for $2 each. How much does she make daily?"

messages = [{"role": "user", "content": f"Solve step by step. End with the answer after '####'.\n\n{question}"}]
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

with torch.no_grad():
    outputs = model.generate(**inputs, max_new_tokens=512, do_sample=False)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Training

  • Method: Imitation learning (behavior cloning)
  • Teacher: GPT-4o-mini
  • Dataset: GSM8K training set (7,473 examples)
  • LoRA: rank=64, alpha=128, dropout=0.08
  • Training: 2 epochs, lr=1e-5, batch size 16

Limitations

  • Grade school math only (arithmetic, basic algebra, percentages)
  • Expects #### answer format for answer extraction
  • May not generalize to advanced mathematics

Citation

@article{cobbe2021gsm8k,
  title={Training Verifiers to Solve Math Word Problems},
  author={Cobbe, Karl and others},
  journal={arXiv preprint arXiv:2110.14168},
  year={2021}
}

License

Apache 2.0

Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for v-vasi/qwen-math-gpt4o-imitation

Base model

Qwen/Qwen2.5-1.5B
Adapter
(623)
this model

Dataset used to train v-vasi/qwen-math-gpt4o-imitation

Space using v-vasi/qwen-math-gpt4o-imitation 1

Paper for v-vasi/qwen-math-gpt4o-imitation