🍽️ Gemma-3-270M Restaurant Reservation NER Model

Hugging Face License Model Size

A specialized fine-tuned version of Google's Gemma-3-270M model designed for extracting restaurant reservation information from user messages with robust handling of ASR-generated text.

✨ Key Features

  • 🎯 Entity Extraction: Identifies three key reservation elements
  • 🌐 Bilingual Support: Handles both Chinese and English input
  • 🎙️ ASR Robust: Optimized for noisy speech recognition output
  • 📱 Phone Focus: Specialized for Taiwanese mobile number extraction

📋 Comprehensive Example: All Three Entities

Complex Input Text:
"Hi, I'd like to make a reservation for 2 adults and 3 children on the 15th of next month around 7:30 in the evening, and you can reach me at +886-912-345-678"

Extracted Output:

{
  "num_people": "5",
  "reservation_date": "15th of next month at 7:30 PM", 
  "phone_num": "0912345678"
}

Entity Breakdown from this Complex Example:

Entity Extracted From Input Normalized Output
num_people "2 adults and 3 children" "5" (summed total)
reservation_date "15th of next month around 7:30 in the evening" "15th of next month at 7:30 PM" (normalized time format)
phone_num "+886-912-345-678" "0912345678" (international format converted to local)

⚠️ Important Note: Phone Number Handling

This model exclusively extracts Taiwanese 10-digit mobile numbers (09XXXXXXXX format):

Extracted: Mobile numbers with complex variations

  • "+886-912-345-678"0912345678 (international format)
  • "零九一二三四五六七八"0912345678 (Chinese characters)
  • "09 12 34 56 78"0912345678 (spaced format)

Ignored: Non-mobile numbers

  • "市話02-1234-5678""" (landline)
  • "國際電話+1-555-123-4567""" (international non-Taiwanese)
  • "免付費0800-123-456""" (toll-free)

🚀 Quick Start

Installation

pip install transformers torch

Basic Usage

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
import json

# Load model and tokenizer
model_name = "Luigi/dinercall-ner"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.bfloat16)

# System prompt (in original Chinese)
system_prompt = """你是一個助理,負責從用戶消息中提取預訂資訊並以JSON格式輸出。
JSON必須包含三個字段: num_people, reservation_date, phone_num。
如果某個字段沒有信息,使用空字符串。只輸出JSON,不要添加任何其他文字。"""

# Example with complex input
user_input = "Hi, I'd like to make a reservation for 2 adults and 3 children on the 15th of next month around 7:30 in the evening, and you can reach me at +886-912-345-678"
messages = [
    {"role": "system", "content": system_prompt},
    {"role": "user", "content": user_input}
]

# Generate response
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(prompt, return_tensors="pt")

with torch.no_grad():
    outputs = model.generate(
        **inputs,
        max_new_tokens=64,
        temperature=0.1,
        do_sample=False
    )

# Process output
response = tokenizer.decode(outputs[0][len(inputs.input_ids[0]):], skip_special_tokens=True)
result = json.loads(response)
print(result)
# Output: {"num_people": "5", "reservation_date": "15th of next month at 7:30 PM", "phone_num": "0912345678"}

🎯 Use Cases

Perfect For

  • 🗣️ Voice assistant reservation systems
  • 🤖 Chatbot booking interfaces
  • 📞 Call center automation
  • 📱 Mobile app reservation features

Ideal Input Types

  • English: "Book for 6 people next Friday at 8 PM"
  • Chinese: "預約明天晚上7點,四位成人"
  • Mixed: "我想book 4位,tomorrow at 7 PM"

📊 Training Details

Dataset

  • Source: dinercall-ner dataset
  • Samples: 20,000 synthetic reservation requests
  • Language: 70% Chinese, 30% English
  • Features: ASR noise simulation, realistic error patterns

Configuration

Parameter Value
Base Model unsloth/gemma-3-270m-it-unsloth-bnb-4bit
Max Sequence Length 256 tokens
Learning Rate 2e-5
Batch Size 4 (gradient accumulation: 2)
Training Epochs 10
LoRA Rank 32

📝 Advanced Examples & Outputs

Complex Input Examples with Outputs

# Example 1: Complex English with mixed formatting
input_text = "Could you please reserve a table for 3 adults and 2 children on December 24th around 8 PM? My contact is +886-987-654-321"
output = {
  "num_people": "5",
  "reservation_date": "December 24th at 8 PM",
  "phone_num": "0987654321"
}

# Example 2: Chinese with complex date and mixed digits
input_text = "我們想要預約下個月15號晚上7點半,4大2小,電話是零九八七-六五四三二一"
output = {
  "num_people": "6",
  "reservation_date": "下個月15號晚上7點半",
  "phone_num": "0987654321"
}

# Example 3: Noisy ASR input with complex elements
input_text = "Book for for 2 adullts and 1 childreen onn nexts Friday at 6:45 PM, fone 09八七六五四三二一"
output = {
  "num_people": "3",
  "reservation_date": "next Friday at 6:45 PM",
  "phone_num": "0987654321"
}

# Example 4: Mixed language with complex request
input_text = "我想book 3大人2小孩,time是next Wednesday at 7:30 PM,contact number是0912-345-678"
output = {
  "num_people": "5",
  "reservation_date": "next Wednesday at 7:30 PM",
  "phone_num": "0912345678"
}

⚠️ Limitations & Considerations

Technical Limitations

  • 🎯 Phone Numbers: Only Taiwanese mobile numbers (09XXXXXXXX)
  • 🌍 Geography: Optimized for Taiwanese reservation patterns
  • 🎙️ ASR Types: Best performance on simulated ASR errors similar to training data
  • 💬 Language Mix: Handles Chinese/English mixing but may struggle with other languages

Ethical Considerations

  • 🔒 Privacy: Only extracts mobile numbers; landline numbers are ignored
  • 📋 Consent: Ensure proper user consent for data processing
  • ⚖️ Compliance: Follow local regulations for data handling

📚 Citation

If you use this model in your research, please cite:

@software{dinercall_ner_model_2025,
  author = {Luigi},
  title = {Gemma-3-270M Fine-tuned for Restaurant Reservation NER},
  year = {2025},
  publisher = {Hugging Face},
  url = {https://huggingface.co/Luigi/dinercall-ner}
}

🆘 Support

For questions, issues, or contributions:

  • 📧 Open an issue on the Hugging Face repository
  • 💬 Check the examples above for common usage patterns
  • 🔧 Review the limitations section before deployment

📄 License

This model inherits the license terms of the base Gemma model. Please review Google's license terms for specific usage rights and restrictions.

Downloads last month
3
Safetensors
Model size
0.3B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Luigi/gemma-3-270m-it-dinercall-ner

Finetuned
(65)
this model

Dataset used to train Luigi/gemma-3-270m-it-dinercall-ner

Space using Luigi/gemma-3-270m-it-dinercall-ner 1

Collection including Luigi/gemma-3-270m-it-dinercall-ner