justinthelaw/SmolLM2-360M-Instruct_Resume-SFT-DPO

A fine-tuned version of HuggingFaceTB/SmolLM2-360M-Instruct trained with a two-stage pipeline (SFT + DPO) to answer questions about Justin's professional background, skills, and experience.

Model Description

This model is designed for browser-based inference using transformers.js. It powers a personal website chatbot that can answer questions about Justin's resume, work experience, education, and skills.

Training Pipeline

The model is trained using a two-stage approach optimized for factual memorization:

  1. SFT (Supervised Fine-Tuning): Primary training for factual memorization using conversation-formatted QA pairs
  2. DPO (Direct Preference Optimization): Refinement training to prefer accurate answers over hallucinations

Training Details

SFT Training Configuration

  • Epochs: 5
  • Batch Size: 8
  • Learning Rate: 0.0001

DPO Training Configuration

  • Epochs: 1
  • Batch Size: 4
  • Learning Rate: 5e-06
  • Beta: 0.05
  • Loss Type: sigmoid

Model Formats

This repository contains multiple model formats:

Format Location Use Case
SafeTensors / (root) Python/PyTorch inference
ONNX /onnx/model.onnx Full precision ONNX Runtime
ONNX Quantized /onnx/model_quantized.onnx Browser inference (transformers.js)

Note: If quantization fails during export due to weight distribution issues, model_quantized.onnx will be a copy of the fp16 model for compatibility.

Usage

Browser (transformers.js)

import { pipeline } from "@huggingface/transformers";

const generator = await pipeline(
  "text-generation",
  "justinthelaw/SmolLM2-360M-Instruct_Resume-SFT-DPO",
  { dtype: "q8" } // Uses model_quantized.onnx
);

const output = await generator("What is Justin's background?", {
  max_new_tokens: 256,
  temperature: 0.7,
});

Python (Transformers)

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("justinthelaw/SmolLM2-360M-Instruct_Resume-SFT-DPO")
tokenizer = AutoTokenizer.from_pretrained("justinthelaw/SmolLM2-360M-Instruct_Resume-SFT-DPO")

prompt = "What is Justin's background?"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=256)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Intended Use

This model is intended for:

  • Personal website chatbots
  • Resume Q&A applications
  • Demonstrating fine-tuning techniques for personalized AI assistants

Limitations

  • The model is specifically trained on Justin's resume and may not generalize to other topics
  • Responses are based on training data and may not reflect real-time information
  • Not suitable for general-purpose question answering

Author

Justin

License

This model is released under the Apache 2.0 license.

Downloads last month
810
Safetensors
Model size
0.4B params
Tensor type
F16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for justinthelaw/SmolLM2-360M-Instruct_Resume-SFT-DPO

Adapter
(21)
this model

Dataset used to train justinthelaw/SmolLM2-360M-Instruct_Resume-SFT-DPO