🍽️ Gemma-3-270M Restaurant Reservation NER Model

A specialized fine-tuned version of Google's Gemma-3-270M model designed for extracting restaurant reservation information from user messages with robust handling of ASR-generated text.

✨ Key Features

🎯 Entity Extraction: Identifies three key reservation elements
🌐 Bilingual Support: Handles both Chinese and English input
🎙️ ASR Robust: Optimized for noisy speech recognition output
📱 Phone Focus: Specialized for Taiwanese mobile number extraction

📋 Comprehensive Example: All Three Entities

Complex Input Text:
"Hi, I'd like to make a reservation for 2 adults and 3 children on the 15th of next month around 7:30 in the evening, and you can reach me at +886-912-345-678"

Extracted Output:

{
  "num_people": "5",
  "reservation_date": "15th of next month at 7:30 PM", 
  "phone_num": "0912345678"
}

Entity Breakdown from this Complex Example:

Entity	Extracted From Input	Normalized Output
`num_people`	`"2 adults and 3 children"`	`"5"` (summed total)
`reservation_date`	`"15th of next month around 7:30 in the evening"`	`"15th of next month at 7:30 PM"` (normalized time format)
`phone_num`	`"+886-912-345-678"`	`"0912345678"` (international format converted to local)

⚠️ Important Note: Phone Number Handling

This model exclusively extracts Taiwanese 10-digit mobile numbers (09XXXXXXXX format):

✅ Extracted: Mobile numbers with complex variations

"+886-912-345-678" → 0912345678 (international format)
"零九一二三四五六七八" → 0912345678 (Chinese characters)
"09 12 34 56 78" → 0912345678 (spaced format)

❌ Ignored: Non-mobile numbers

"市話02-1234-5678" → "" (landline)
"國際電話+1-555-123-4567" → "" (international non-Taiwanese)
"免付費0800-123-456" → "" (toll-free)

🚀 Quick Start

Installation

pip install transformers torch

Basic Usage

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
import json

# Load model and tokenizer
model_name = "Luigi/dinercall-ner"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.bfloat16)

# System prompt (in original Chinese)
system_prompt = """你是一個助理，負責從用戶消息中提取預訂資訊並以JSON格式輸出。
JSON必須包含三個字段: num_people, reservation_date, phone_num。
如果某個字段沒有信息，使用空字符串。只輸出JSON，不要添加任何其他文字。"""

# Example with complex input
user_input = "Hi, I'd like to make a reservation for 2 adults and 3 children on the 15th of next month around 7:30 in the evening, and you can reach me at +886-912-345-678"
messages = [
    {"role": "system", "content": system_prompt},
    {"role": "user", "content": user_input}
]

# Generate response
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(prompt, return_tensors="pt")

with torch.no_grad():
    outputs = model.generate(
        **inputs,
        max_new_tokens=64,
        temperature=0.1,
        do_sample=False
    )

# Process output
response = tokenizer.decode(outputs[0][len(inputs.input_ids[0]):], skip_special_tokens=True)
result = json.loads(response)
print(result)
# Output: {"num_people": "5", "reservation_date": "15th of next month at 7:30 PM", "phone_num": "0912345678"}

🎯 Use Cases

Perfect For

🗣️ Voice assistant reservation systems
🤖 Chatbot booking interfaces
📞 Call center automation
📱 Mobile app reservation features

Ideal Input Types

English: "Book for 6 people next Friday at 8 PM"
Chinese: "預約明天晚上7點，四位成人"
Mixed: "我想book 4位，tomorrow at 7 PM"

📊 Training Details

Dataset

Source: dinercall-ner dataset
Samples: 20,000 synthetic reservation requests
Language: 70% Chinese, 30% English
Features: ASR noise simulation, realistic error patterns

Configuration

Parameter	Value
Base Model	`unsloth/gemma-3-270m-it-unsloth-bnb-4bit`
Max Sequence Length	256 tokens
Learning Rate	2e-5
Batch Size	4 (gradient accumulation: 2)
Training Epochs	10
LoRA Rank	32

📝 Advanced Examples & Outputs

Complex Input Examples with Outputs

# Example 1: Complex English with mixed formatting
input_text = "Could you please reserve a table for 3 adults and 2 children on December 24th around 8 PM? My contact is +886-987-654-321"
output = {
  "num_people": "5",
  "reservation_date": "December 24th at 8 PM",
  "phone_num": "0987654321"
}

# Example 2: Chinese with complex date and mixed digits
input_text = "我們想要預約下個月15號晚上7點半，4大2小，電話是零九八七-六五四三二一"
output = {
  "num_people": "6",
  "reservation_date": "下個月15號晚上7點半",
  "phone_num": "0987654321"
}

# Example 3: Noisy ASR input with complex elements
input_text = "Book for for 2 adullts and 1 childreen onn nexts Friday at 6:45 PM, fone 09八七六五四三二一"
output = {
  "num_people": "3",
  "reservation_date": "next Friday at 6:45 PM",
  "phone_num": "0987654321"
}

# Example 4: Mixed language with complex request
input_text = "我想book 3大人2小孩，time是next Wednesday at 7:30 PM，contact number是0912-345-678"
output = {
  "num_people": "5",
  "reservation_date": "next Wednesday at 7:30 PM",
  "phone_num": "0912345678"
}

⚠️ Limitations & Considerations

Technical Limitations

🎯 Phone Numbers: Only Taiwanese mobile numbers (09XXXXXXXX)
🌍 Geography: Optimized for Taiwanese reservation patterns
🎙️ ASR Types: Best performance on simulated ASR errors similar to training data
💬 Language Mix: Handles Chinese/English mixing but may struggle with other languages

Ethical Considerations

🔒 Privacy: Only extracts mobile numbers; landline numbers are ignored
📋 Consent: Ensure proper user consent for data processing
⚖️ Compliance: Follow local regulations for data handling

📚 Citation

If you use this model in your research, please cite:

@software{dinercall_ner_model_2025,
  author = {Luigi},
  title = {Gemma-3-270M Fine-tuned for Restaurant Reservation NER},
  year = {2025},
  publisher = {Hugging Face},
  url = {https://huggingface.co/Luigi/dinercall-ner}
}