RustBusters BERT Relevance Assessment Model Card
Model Description
Model Name: RustBusters-BERT-Relevance-Classifier
Base Model: bert-base-uncased
Architecture: BERT (Bidirectional Encoder Representations from Transformers)
Task: Binary Text Classification
Version: 1.0
Last Updated: March 2025
Intended Use
This model is designed to classify incoming customer queries as either relevant or not relevant to laser cleaning services. The model serves as a first-line filter to:
- Identify queries related to laser cleaning that should be routed to RustBusters' customer service
- Filter out unrelated queries to improve response efficiency
- Help automate initial query triage in customer service workflows
- Support chatbots and digital assistants in determining when to engage with laser cleaning queries
Training Details
- Base Model: bert-base-uncased (110M parameters)
- Training Data: 714 examples (571 training, 143 testing)
- Positive examples (relevant to laser cleaning): 557 (78%)
- Negative examples (not relevant to laser cleaning): 157 (22%)
- Training Method: Fine-tuning with AdamW optimizer
- Training Parameters:
- Learning rate: 2e-5
- Batch size: 16
- Epochs: 3
- Sequence length: 128 tokens
- Performance:
- Final accuracy: 95.8%
- Precision for relevant class: 0.97
- Recall for relevant class: 0.97
- F1-score for relevant class: 0.97
- Precision for non-relevant class: 0.90
- Recall for non-relevant class: 0.90
- F1-score for non-relevant class: 0.90
Performance and Limitations
Strengths:
- High accuracy (95.8%) on test set
- Well-balanced precision and recall for both classes
- Effective at identifying laser cleaning related queries
- Small model size, efficient for deployment
- Fast inference times
Limitations:
- Limited to binary classification (relevant vs. not relevant)
- May struggle with highly ambiguous queries
- Cannot categorize queries by type, urgency, or complexity
- Limited exposure to industry-specific terminology beyond training data
- Performance dependent on queries being similar to training examples
Implementation Guidelines
The model assigns label 1 for relevant queries and label 0 for non-relevant queries. Implementation should account for this labeling scheme:
def classify_query(text):
# Tokenize input
encoding = tokenizer(
text,
add_special_tokens=True,
max_length=128,
padding='max_length',
truncation=True,
return_attention_mask=True,
return_tensors='pt'
)
# Get prediction
model.eval()
with torch.no_grad():
outputs = model(
input_ids=encoding['input_ids'].to(device),
attention_mask=encoding['attention_mask'].to(device)
)
# Apply softmax to get probabilities
probs = torch.nn.functional.softmax(outputs.logits, dim=1)[0]
class_0_prob = probs[0].item() # Not relevant probability
class_1_prob = probs[1].item() # Relevant probability
# Simple threshold-based classification
predicted_class = 1 if class_1_prob > 0.5 else 0
# Optional: Enhanced classification with keyword verification
laser_keywords = ["laser", "clean", "rust", "metal", "surface"]
contains_keywords = any(keyword in text.lower() for keyword in laser_keywords)
# Return classification result
if predicted_class == 1 or contains_keywords:
return "Relevant to laser cleaning"
else:
return "Not relevant to laser cleaning"
Data Characteristics
The model was trained on a rich dataset containing:
- Queries about laser cleaning services, pricing, processes, and applications
- Questions about materials that can be laser cleaned (metals, industrial equipment, automotive parts)
- Service area inquiries related to Huntsville and Alabama
- Edge cases like general rust removal without mentioning laser
- Negative examples including:
- General information requests unrelated to laser cleaning
- Other cleaning-related queries that aren't laser-specific
- Questions about completely different services and products
The dataset was systematically expanded through:
- Template-based generation with material/problem variations
- Compound questions combining multiple aspects of laser cleaning
- Paraphrasing of base examples
- Inclusion of carefully labeled ambiguous examples
Ethical Considerations
- False Negatives: Important customer inquiries might be misclassified as irrelevant
- Transparency: Users should be informed if their queries are being automatically filtered
- Human Oversight: Regular auditing of model classifications is recommended
- Bias: Monitor for potential bias against certain query formulations or terminology
Maintenance Recommendations
We recommend:
- Periodically retraining with new customer queries to capture evolving language patterns
- Monitoring performance metrics, especially on edge cases
- Adding any consistently misclassified queries to the training dataset
- Considering expansion to multi-class classification for more nuanced routing
Contact Information
For issues, improvements, or questions about this model, please contact the RustBusters AI team.
This model card follows best practices for AI documentation and transparency.
- Downloads last month
- 4
Model tree for Dudeman523/BERT_Rustbusters
Base model
google-bert/bert-base-uncased