Dudeman523
/

BERT_Rustbusters

+---
+license: mit
+language:
+- en
+base_model:
+- google-bert/bert-base-uncased
+---
+# RustBusters BERT Relevance Assessment Model Card
+## Model Description
+**Model Name:** RustBusters-BERT-Relevance-Classifier
+**Base Model:** bert-base-uncased
+**Architecture:** BERT (Bidirectional Encoder Representations from Transformers)
+**Task:** Binary Text Classification
+**Version:** 1.0
+**Last Updated:** March 2025
+## Intended Use
+This model is designed to classify incoming customer queries as either relevant or not relevant to laser cleaning services. The model serves as a first-line filter to:
+- Identify queries related to laser cleaning that should be routed to RustBusters' customer service
+- Filter out unrelated queries to improve response efficiency
+- Help automate initial query triage in customer service workflows
+- Support chatbots and digital assistants in determining when to engage with laser cleaning queries
+## Training Details
+- **Base Model:** bert-base-uncased (110M parameters)
+- **Training Data:** 714 examples (571 training, 143 testing)
+  - Positive examples (relevant to laser cleaning): 557 (78%)
+  - Negative examples (not relevant to laser cleaning): 157 (22%)
+- **Training Method:** Fine-tuning with AdamW optimizer
+- **Training Parameters:**
+  - Learning rate: 2e-5
+  - Batch size: 16
+  - Epochs: 3
+  - Sequence length: 128 tokens
+- **Performance:**
+  - Final accuracy: 95.8%
+  - Precision for relevant class: 0.97
+  - Recall for relevant class: 0.97
+  - F1-score for relevant class: 0.97
+  - Precision for non-relevant class: 0.90
+  - Recall for non-relevant class: 0.90
+  - F1-score for non-relevant class: 0.90
+## Performance and Limitations
+- **Strengths:**
+  - High accuracy (95.8%) on test set
+  - Well-balanced precision and recall for both classes
+  - Effective at identifying laser cleaning related queries
+  - Small model size, efficient for deployment
+  - Fast inference times
+- **Limitations:**
+  - Limited to binary classification (relevant vs. not relevant)
+  - May struggle with highly ambiguous queries
+  - Cannot categorize queries by type, urgency, or complexity
+  - Limited exposure to industry-specific terminology beyond training data
+  - Performance dependent on queries being similar to training examples
+## Implementation Guidelines
+The model assigns label 1 for relevant queries and label 0 for non-relevant queries. Implementation should account for this labeling scheme:
+```python
+def classify_query(text):
+    # Tokenize input
+    encoding = tokenizer(
+        text,
+        add_special_tokens=True,
+        max_length=128,
+        padding='max_length',
+        truncation=True,
+        return_attention_mask=True,
+        return_tensors='pt'
+    )
+    # Get prediction
+    model.eval()
+    with torch.no_grad():
+        outputs = model(
+            input_ids=encoding['input_ids'].to(device),
+            attention_mask=encoding['attention_mask'].to(device)
+        )
+        # Apply softmax to get probabilities
+        probs = torch.nn.functional.softmax(outputs.logits, dim=1)[0]
+        class_0_prob = probs[0].item()  # Not relevant probability
+        class_1_prob = probs[1].item()  # Relevant probability
+        # Simple threshold-based classification
+        predicted_class = 1 if class_1_prob > 0.5 else 0
+        # Optional: Enhanced classification with keyword verification
+        laser_keywords = ["laser", "clean", "rust", "metal", "surface"]
+        contains_keywords = any(keyword in text.lower() for keyword in laser_keywords)
+        # Return classification result
+        if predicted_class == 1 or contains_keywords:
+            return "Relevant to laser cleaning"
+        else:
+            return "Not relevant to laser cleaning"
+```
+## Data Characteristics
+The model was trained on a rich dataset containing:
+- Queries about laser cleaning services, pricing, processes, and applications
+- Questions about materials that can be laser cleaned (metals, industrial equipment, automotive parts)
+- Service area inquiries related to Huntsville and Alabama
+- Edge cases like general rust removal without mentioning laser
+- Negative examples including:
+  - General information requests unrelated to laser cleaning
+  - Other cleaning-related queries that aren't laser-specific
+  - Questions about completely different services and products
+The dataset was systematically expanded through:
+- Template-based generation with material/problem variations
+- Compound questions combining multiple aspects of laser cleaning
+- Paraphrasing of base examples
+- Inclusion of carefully labeled ambiguous examples
+## Ethical Considerations
+- **False Negatives:** Important customer inquiries might be misclassified as irrelevant
+- **Transparency:** Users should be informed if their queries are being automatically filtered
+- **Human Oversight:** Regular auditing of model classifications is recommended
+- **Bias:** Monitor for potential bias against certain query formulations or terminology
+## Maintenance Recommendations
+We recommend:
+- Periodically retraining with new customer queries to capture evolving language patterns
+- Monitoring performance metrics, especially on edge cases
+- Adding any consistently misclassified queries to the training dataset
+- Considering expansion to multi-class classification for more nuanced routing
+## Contact Information
+For issues, improvements, or questions about this model, please contact the RustBusters AI team.
+---
+*This model card follows best practices for AI documentation and transparency.*