Qwen2.5-1.5B Math Reasoning (GPT-4o-mini Teacher)

A LoRA adapter for Qwen2.5-1.5B-Instruct trained via imitation learning on chain-of-thought reasoning from GPT-4o-mini. See the full project on GitHub for training code and experimental details.

Try the live demo

Performance

Model	Accuracy	Relative Improvement
Qwen2.5-1.5B-Instruct (base)	60.96%	-
This model	67.17%	+10.2%

Evaluated on GSM8K test set (1,319 problems).

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch

base_model = AutoModelForCausalLM.from_pretrained(
    "Qwen/Qwen2.5-1.5B-Instruct",
    torch_dtype=torch.bfloat16,
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-1.5B-Instruct")
model = PeftModel.from_pretrained(base_model, "v-vasi/qwen-math-gpt4o-imitation")

question = "Janet's ducks lay 16 eggs per day. She eats three for breakfast and bakes muffins with four. She sells the rest for $2 each. How much does she make daily?"

messages = [{"role": "user", "content": f"Solve step by step. End with the answer after '####'.\n\n{question}"}]
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

with torch.no_grad():
    outputs = model.generate(**inputs, max_new_tokens=512, do_sample=False)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Training

Method: Imitation learning (behavior cloning)
Teacher: GPT-4o-mini
Dataset: GSM8K training set (7,473 examples)
LoRA: rank=64, alpha=128, dropout=0.08
Training: 2 epochs, lr=1e-5, batch size 16

Limitations

Grade school math only (arithmetic, basic algebra, percentages)
Expects #### answer format for answer extraction
May not generalize to advanced mathematics

Citation

@article{cobbe2021gsm8k,
  title={Training Verifiers to Solve Math Word Problems},
  author={Cobbe, Karl and others},
  journal={arXiv preprint arXiv:2110.14168},
  year={2021}
}

License

Apache 2.0

Downloads last month: -

Model tree for v-vasi/qwen-math-gpt4o-imitation

Base model

Qwen/Qwen2.5-1.5B

Finetuned

Qwen/Qwen2.5-1.5B-Instruct

Adapter

(623)

this model

Dataset used to train v-vasi/qwen-math-gpt4o-imitation

Space using v-vasi/qwen-math-gpt4o-imitation 1

Paper for v-vasi/qwen-math-gpt4o-imitation

Training Verifiers to Solve Math Word Problems

Paper • 2110.14168 • Published Oct 27, 2021 • 6