--- language: en license: mit tags: - token-classification - named-entity-recognition - ner - contact-management - roberta base_model: roberta-base datasets: - custom model-index: - name: assistant-bot-ner-model results: - task: type: token-classification name: Named Entity Recognition metrics: - type: accuracy value: 0.951 name: Accuracy - type: f1 value: 0.946 name: F1 Score --- # NER Model for Contact Management Assistant Bot This model is a fine-tuned RoBERTa-base model for Named Entity Recognition (NER) in contact management tasks. ## Model Description - **Developed by:** Mykyta Kotenko - **Base Model:** [roberta-base](https://huggingface.co/roberta-base) by Facebook AI - **Task:** Token Classification (Named Entity Recognition) - **Language:** English - **License:** MIT - **Accuracy:** 95.1% - **Entity Accuracy:** 93.7% - **F1 Score:** 94.6% ## Supported Entities This model extracts the following entity types: - **NAME**: Person's full name - **PHONE**: Phone numbers in various formats - **EMAIL**: Email addresses - **ADDRESS**: Full street addresses (including building numbers, street names, apartments, cities, states, ZIP codes) - **BIRTHDAY**: Dates of birth - **TAG**: Contact tags - **NOTE_TEXT**: Note content - **ID**: Contact/note identifiers - **DAYS**: Time periods ## Usage ### Basic Usage ```python from transformers import AutoTokenizer, AutoModelForTokenClassification, pipeline # Load model and tokenizer tokenizer = AutoTokenizer.from_pretrained("kms-engineer/assistant-bot-ner-model") model = AutoModelForTokenClassification.from_pretrained("kms-engineer/assistant-bot-ner-model") # Create NER pipeline ner_pipeline = pipeline( "token-classification", model=model, tokenizer=tokenizer, aggregation_strategy="simple" # Merge B-/I- tokens ) # Extract entities text = "Add contact John Smith 212-555-0123 john@example.com 123 Broadway, New York" results = ner_pipeline(text) for result in results: print(f"{result['entity_group']}: {result['word']}") ``` **Output:** ``` NAME: John Smith PHONE: 212-555-0123 EMAIL: john@example.com ADDRESS: 123 Broadway, New York ``` ### Advanced Usage with Address Recognition ```python # Example with full address including building number text = "Add contact Alon 212-555-0123 alon@example.com 45, 5 Ave, unit 34, New York" results = ner_pipeline(text) for result in results: print(f"{result['entity_group']}: {result['word']}") ``` **Output:** ``` NAME: Alon PHONE: 212-555-0123 EMAIL: alon@example.com ADDRESS: 45, 5 Ave, unit 34, New York ``` ### Batch Processing ```python texts = [ "Add contact Sarah 718-555-4567 sarah@email.com lives at 123 Broadway, Apt 5B, NY 10001", "Create contact Michael at 789 Park Avenue, Suite 12, Manhattan, NY 10021 phone 917-555-8901", "Register David Martinez 1234 Sunset Boulevard, Los Angeles, CA 90028" ] for text in texts: results = ner_pipeline(text) print(f"\nText: {text}") for result in results: print(f" - {result['entity_group']}: {result['word']}") ``` ## Training Details ### Dataset - **Size:** 2,185 training examples - **ADDRESS entities:** 543 occurrences (including full street addresses with building numbers) - **NAME entities:** 1,897 occurrences - **PHONE entities:** 564 occurrences - **EMAIL entities:** 415 occurrences - **BIRTHDAY entities:** 252 occurrences ### Training Configuration - **Base Model:** roberta-base - **Learning Rate:** 3e-5 - **Batch Size:** 16 - **Max Length:** 128 tokens - **Epochs:** 5 - **Optimizer:** AdamW - **Training Framework:** Hugging Face Transformers ### Performance Metrics | Metric | Value | |--------|-------| | Accuracy | 95.1% | | Entity Accuracy | 93.7% | | Precision | 94.9% | | Recall | 95.1% | | F1 Score | 94.6% | ## Key Features ### ✅ Full Address Recognition Unlike many NER models that only recognize city names, this model recognizes **complete street addresses** including: - Building numbers (45, 123, 1234, etc.) - Street names (Broadway, 5 Ave, Sunset Boulevard, etc.) - Unit/Apartment numbers (unit 34, Apt 5B, Suite 12, Floor 3) - Cities and states (New York, NY, Los Angeles, CA, etc.) - ZIP codes (10001, 90028, 77002, etc.) ### Example: Full Address Recognition **Before (typical NER models):** ``` Input: "add address for Alon 45, 5 ave, unit 34, New York" ADDRESS: "New York" ❌ (only city) ``` **After (this model):** ``` Input: "add address for Alon 45, 5 ave, unit 34, New York" ADDRESS: "45, 5 ave, unit 34, New York" ✅ (full address with building number!) ``` ## Example Predictions ### Example 1: Complete Contact ```python text = "Add contact John Smith 212-555-0123 john@example.com 45, 5 Ave, unit 34, New York" ``` **Extracted Entities:** - NAME: John Smith - PHONE: 212-555-0123 - EMAIL: john@example.com - ADDRESS: 45, 5 Ave, unit 34, New York ### Example 2: Address with ZIP Code ```python text = "Create contact Sarah at 123 Broadway, Apt 5B, New York, NY 10001" ``` **Extracted Entities:** - NAME: Sarah - ADDRESS: 123 Broadway, Apt 5B, New York, NY 10001 ### Example 3: Complex Address ```python text = "Save contact for Michael at 789 Park Avenue, Suite 12, Manhattan, NY 10021 phone 917-555-8901" ``` **Extracted Entities:** - NAME: Michael - PHONE: 917-555-8901 - ADDRESS: 789 Park Avenue, Suite 12, Manhattan, NY 10021 ### Example 4: Different City ```python text = "Register David Martinez 1234 Sunset Boulevard, Los Angeles, CA 90028" ``` **Extracted Entities:** - NAME: David Martinez - ADDRESS: 1234 Sunset Boulevard, Los Angeles, CA 90028 ## Intended Use This model is designed for: - Contact management applications - Personal assistant bots - CRM systems with natural language interface - Address extraction from text - Contact information parsing ## Limitations - **Optimized for US-style addresses** - International addresses not yet in training data - **Best performance on English text** - Other languages not supported - **Contact management domain** - May not generalize well to other domains without fine-tuning ## Model Architecture Based on RoBERTa (Robustly Optimized BERT Pretraining Approach): - **Layers:** 12 transformer layers - **Hidden size:** 768 - **Attention heads:** 12 - **Parameters:** ~125M - **Task:** Token Classification with IOB2 tagging scheme ## Entity Label Format The model uses IOB2 (Inside-Outside-Beginning) format: - `B-{ENTITY}`: Beginning of entity - `I-{ENTITY}`: Inside/continuation of entity - `O`: Outside any entity Example: ``` Tokens: ["Add", "contact", "John", "Smith", "212", "-", "555", "-", "0123"] Labels: ["O", "O", "B-NAME", "I-NAME", "B-PHONE", "I-PHONE", "I-PHONE", "I-PHONE", "I-PHONE"] ``` ## Citation If you use this model, please cite: ```bibtex @misc{kotenko2025nermodel, author = {Kotenko, Mykyta}, title = {NER Model for Contact Management Assistant Bot}, year = {2025}, publisher = {Hugging Face}, howpublished = {\url{https://huggingface.co/kms-engineer/assistant-bot-ner-model}}, note = {Based on RoBERTa by Facebook AI. Achieves 95.1\% accuracy with full address recognition including building numbers.} } ``` ## Acknowledgments - **Base Model:** RoBERTa by Facebook AI Research - **Framework:** Hugging Face Transformers - **Training:** Fine-tuned on custom contact management dataset with 2,185 examples - **Special Feature:** Enhanced address recognition with building numbers, apartments, and full street addresses ## Technical Improvements This model includes several technical improvements over standard NER models: 1. **Enhanced Tokenization:** Improved handling of addresses with fuzzy matching algorithm 2. **Rich Training Data:** 115+ real-world address examples from major US cities 3. **Address Variations:** Multiple formats including "address-first" patterns 4. **High Accuracy:** 95.1% overall accuracy, 93.7% entity-level accuracy ## Updates - **v1.0.0 (2025-01-18):** Initial release - 95.1% accuracy - Full address recognition with building numbers - 2,185 training examples - Support for 9 entity types ## License MIT License - See LICENSE file for details. This model is a derivative work based on RoBERTa, which is licensed under MIT License by Facebook, Inc. ## Contact - **Author:** Mykyta Kotenko - **Repository:** [assistant-bot](https://github.com/kms-engineer/assistant-bot) - **Issues:** Please report issues on GitHub - **Hugging Face:** [kms-engineer](https://huggingface.co/kms-engineer) ## Related Models - **Intent Classifier:** [kms-engineer/assistant-bot-intent-classifier](https://huggingface.co/kms-engineer/assistant-bot-intent-classifier) - **Dataset:** [kms-engineer/assistant-bot-ner-dataset](https://huggingface.co/datasets/kms-engineer/assistant-bot-ner-dataset)