EmpathAI Llama 3.1 8B

EmpathAI Llama 3.1 8B is a Vietnamese customer-support assistant model fine-tuned for empathetic, policy-aware e-commerce conversations. The model was trained with SFT followed by DPO using the thanhhoangnvbg/empathAI-dpo-vi dataset.

The default main branch contains the merged BF16 full-weight export in sharded safetensors format. This is the recommended branch for Featherless AI / Hugging Face Inference Providers because it contains full merged weights rather than LoRA, QLoRA, GGUF, or bitsandbytes-only files.

Branch Layout

Branch Contents Intended use
main Merged BF16 full weights, sharded safetensors Featherless AI / HF Inference Providers / Transformers
v2-gguf GGUF Q4_K_M and Q5_K_M exports llama.cpp, Ollama, local inference
v2-adapter LoRA adapter only Re-merge, continued training, adapter-based loading
v1-bf16 Archived v1 merged weights Reproducibility
v1-gguf Archived v1 GGUF exports Reproducibility
old_version Older archived full-weight branch Reproducibility

Model Details

  • Base model: unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit
  • Architecture family: Llama 3.1 8B Instruct
  • Language: Vietnamese
  • Fine-tuning methods: SFT, then DPO
  • Training framework: Unsloth + TRL + Transformers
  • Export on main: merged BF16 full weights in safetensors
  • Adapter branch: v2-adapter
  • GGUF branch: v2-gguf

Intended Use

This model is intended for Vietnamese e-commerce customer-service conversations, especially cases requiring calm tone, empathy, privacy awareness, and refusal to fabricate order status or compensation decisions.

Examples of intended behavior:

  • acknowledge customer frustration without escalating tone;
  • ask for non-sensitive order information when needed;
  • avoid requesting OTPs, passwords, full card numbers, or full identity documents;
  • avoid inventing delivery, refund, or warranty status without access to backend systems;
  • redirect sensitive account operations to official support channels.

Out-of-Scope Use

This model should not be used as a source of truth for order status, payment status, refund eligibility, medical, legal, or financial advice. It does not have access to real business systems unless integrated with verified tools. Production deployments should add retrieval, policy checks, logging, human escalation, and safety filters.

Training Data

Dataset: thanhhoangnvbg/empathAI-dpo-vi

Observed split sizes used in this run:

Split file Rows Purpose
sft_train.jsonl 7,982 SFT train
sft_dev.jsonl 1,016 SFT validation
sft_test.jsonl 1,002 SFT test
dpo_train.jsonl 5,139 DPO train
dpo_dev.jsonl 651 DPO validation
dpo_test.jsonl 664 DPO test

Training Configuration

Training was run on an NVIDIA L4 GPU with CUDA 12.4.

Key settings:

Stage Setting Value
Shared LoRA rank / alpha r=32, lora_alpha=32
Shared LoRA targets q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
Shared Precision BF16 where supported
SFT Epochs 3
SFT Max length 1024
SFT Batch / grad accumulation 1 / 8
SFT Learning rate 2e-4
DPO Epochs 1
DPO Max length / prompt length 1536 / 1024
DPO Batch / grad accumulation 1 / 8
DPO Learning rate 3e-5

Runtime stack:

  • Unsloth 2026.6.1
  • Transformers 5.5.0
  • TRL 0.24.0
  • PyTorch 2.5.1+cu124

Evaluation

These metrics are internal validation/test metrics on the same dataset distribution, not an external benchmark.

SFT

Metric Value
Train runtime 11,450 seconds
Train loss 4.044
Final dev eval loss 0.4424
SFT test loss 0.4120

DPO

Metric Value
Train runtime 13,280 seconds
Train loss 0.0002717
Final dev eval loss 0.002075
DPO test loss 0.0006332
DPO test reward accuracy 1.0000
DPO test chosen reward 11.4892
DPO test rejected reward -6.4002
DPO test reward margin 17.8894

The DPO preference accuracy and reward margin are very high. This means the model strongly separates chosen and rejected answers on the held-out DPO split, but it should not be interpreted as a full real-world benchmark.

How to Use

import torch

# Optional compatibility patch for torch 2.5.x + torchao 0.15.x environments.
for i in range(1, 8):
    if not hasattr(torch, f"int{i}"):
        setattr(torch, f"int{i}", torch.int8)
    if not hasattr(torch, f"uint{i}"):
        setattr(torch, f"uint{i}", torch.uint8)

from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "thanhhoangnvbg/empathAI-llama3.1-8b"

tokenizer = AutoTokenizer.from_pretrained(model_id, revision="main")
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    revision="main",
    dtype=torch.bfloat16,
    device_map="auto",
)

messages = [
    {
        "role": "system",
        "content": """Bạn là EmpathAI, trợ lý chăm sóc khách hàng e-commerce tiếng Việt.

Nguyên tắc:
- Không tự suy đoán trạng thái đơn hàng.
- Không tự suy đoán chính sách hoàn tiền, đổi trả hoặc bồi thường.
- Khi thiếu thông tin, hãy nói rõ rằng chưa thể xác nhận.
- Chỉ trả lời dựa trên thông tin được cung cấp.
- Nếu không đủ dữ liệu, hãy yêu cầu thông tin bổ sung hoặc hướng dẫn khách liên hệ bộ phận hỗ trợ.
- Trả lời ngắn gọn, lịch sự và đồng cảm.

Khi khách hỏi về:
- trạng thái đơn hàng
- hoàn tiền
- bồi thường
- đổi trả

mà chưa có đủ thông tin, hãy ưu tiên các cách diễn đạt:
- "Tôi chưa thể xác nhận từ thông tin hiện có."
- "Mình cần thêm thông tin để kiểm tra."
- "Tôi không có đủ dữ liệu để xác nhận điều đó."
""",
    },
    {
        "role": "user",
        "content": "Đơn hàng của tôi giao trễ 3 ngày rồi, shop có hoàn tiền không?",
    },
]

inputs = tokenizer.apply_chat_template(
    messages,
    tokenize=True,
    add_generation_prompt=True,
    return_tensors="pt",
    return_dict=True,
)
device = "cuda" if torch.cuda.is_available() else "cpu"
inputs = {key: value.to(device) for key, value in inputs.items()}

outputs = model.generate(
    **inputs,
    max_new_tokens=180,
    temperature=0.4,
    top_p=0.9,
    do_sample=True,
    pad_token_id=tokenizer.eos_token_id,
)

prompt_length = inputs["input_ids"].shape[-1]
print(tokenizer.decode(outputs[0][prompt_length:], skip_special_tokens=True))

Deployment Notes

  • Use main for Featherless AI and HF Inference Providers.
  • Use v2-gguf for llama.cpp/Ollama-style local inference.
  • Use v2-adapter only if you want the LoRA adapter and plan to load it with the base model or re-merge it yourself.
  • The default branch is intentionally BF16 full weights, not bitsandbytes 4-bit, because Featherless AI expects full safetensors weights and handles serving-side optimization.

Limitations

  • The model is specialized for Vietnamese e-commerce support and may underperform outside this domain.
  • The model can still hallucinate if asked for facts not present in context.
  • It cannot verify real orders, refunds, shipping status, identity, or payment state without external tools.
  • Internal DPO metrics are strong but do not replace external evaluation or human review.
  • Production use should include policy enforcement, PII handling, monitoring, and escalation paths.
Downloads last month
1,967
Safetensors
Model size
8B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 1 Ask for provider support

Model tree for thanhhoangnvbg/empathAI-llama3.1-8b

Dataset used to train thanhhoangnvbg/empathAI-llama3.1-8b