AIPlans

Qwen3-0.6B-KTO

Model Card for Model ID

This model is a fine-tuned variant of Qwen/Qwen3-0.6B, trained using KTO (Kahneman-Tversky Optimization) on the nvidia/HelpSteer2 dataset as part of the AIPlans Model Diffing Project.

Model Details

Model Description

This model is a 0.6B parameter language model based on Qwen3-0.6B and fine-tuned using KTO for preference optimization. The goal of the fine-tuning was to improve helpfulness/harmlessness behavior as measured by the HelpSteer2 dataset, while also enabling controlled model diffing experiments as part of the AIPlans research workflow.

Special care was taken to reduce GPU memory usage during KTO training, including gradient checkpointing, selective layer freezing, and reduced forward-pass caching.

Developed by: AIPlans

Funded by : AIPlans

Shared by: AIPlans

Model type: Causal decoder-only Transformer (LLM)

Languages: English

License: MIT (inherits from base model and dataset licensing)

Fine-tuned from: Qwen/Qwen3-0.6B

Training Method: Kahneman-Tversky Optimization (KTO)

Intended Use: Research on model diffing, preference fine-tuning, evaluation of lightweight LLM behavior changes

Model Sources

Training Details

Training Data

Used HelpSteer2 dataset by Nvidia. To convert the dataset into preference dataset we used threshold of 3 on helpfulness. Anything below three belongs to rejected class. Anything above three belongs to accepted class. Data points that were equal 3 were discarded

For more information refer to the github script

Evaluation

Below is a comparison between the base Qwen3-0.6B model and our KTO-trained version (trained using HelpSteer2 preference data).

πŸ“Š Benchmark Comparison

Task Metric Base Model KTO Model Change
ARC Challenge acc 0.3148 0.3123 -0.0025
ARC Challenge acc_norm 0.3447 0.3422 -0.0025
ARC Easy acc 0.6044 0.6107 +0.0063
ARC Easy acc_norm 0.5589 0.5602 +0.0013
HellaSwag acc 0.3751 0.3776 +0.0025
HellaSwag acc_norm 0.4738 0.4766 +0.0028
TruthfulQA (mc2) acc 0.4275 0.4324 +0.0049
WinoGrande acc 0.5604 0.5533 -0.0071

πŸ“Œ Summary

The KTO-trained model shows:

  • Improvements on ARC-Easy, HellaSwag, and TruthfulQA (mc2)
  • Minor regressions on ARC-Challenge and WinoGrande (within standard error)
  • Overall better truthfulness and instruction-following without compromising reasoning ability

These results indicate that preference optimization using HelpSteer2 improves alignment and factual helpfulness while maintaining core model capabilities.

Model Card Authors

Jithesh Pavan D Souza - AIPlans Research Intern

Model Card Contact

Jithesh - [email protected]

Downloads last month
2
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for AIPlans/Qwen3-0.6B-KTO

Finetuned
Qwen/Qwen3-0.6B
Finetuned
(542)
this model

Dataset used to train AIPlans/Qwen3-0.6B-KTO

Collections including AIPlans/Qwen3-0.6B-KTO