---
license: mit
language:
- en
base_model:
- google-bert/bert-base-uncased
---
# RustBusters BERT Relevance Assessment Model Card

## Model Description

**Model Name:** RustBusters-BERT-Relevance-Classifier  
**Base Model:** bert-base-uncased  
**Architecture:** BERT (Bidirectional Encoder Representations from Transformers)  
**Task:** Binary Text Classification  
**Version:** 1.0  
**Last Updated:** March 2025  

## Intended Use

This model is designed to classify incoming customer queries as either relevant or not relevant to laser cleaning services. The model serves as a first-line filter to:

- Identify queries related to laser cleaning that should be routed to RustBusters' customer service
- Filter out unrelated queries to improve response efficiency
- Help automate initial query triage in customer service workflows
- Support chatbots and digital assistants in determining when to engage with laser cleaning queries

## Training Details

- **Base Model:** bert-base-uncased (110M parameters)
- **Training Data:** 714 examples (571 training, 143 testing)
  - Positive examples (relevant to laser cleaning): 557 (78%)
  - Negative examples (not relevant to laser cleaning): 157 (22%)
- **Training Method:** Fine-tuning with AdamW optimizer
- **Training Parameters:**
  - Learning rate: 2e-5
  - Batch size: 16
  - Epochs: 3
  - Sequence length: 128 tokens
- **Performance:**
  - Final accuracy: 95.8%
  - Precision for relevant class: 0.97
  - Recall for relevant class: 0.97
  - F1-score for relevant class: 0.97
  - Precision for non-relevant class: 0.90
  - Recall for non-relevant class: 0.90
  - F1-score for non-relevant class: 0.90

## Performance and Limitations

- **Strengths:**
  - High accuracy (95.8%) on test set
  - Well-balanced precision and recall for both classes
  - Effective at identifying laser cleaning related queries
  - Small model size, efficient for deployment
  - Fast inference times

- **Limitations:**
  - Limited to binary classification (relevant vs. not relevant)
  - May struggle with highly ambiguous queries
  - Cannot categorize queries by type, urgency, or complexity
  - Limited exposure to industry-specific terminology beyond training data
  - Performance dependent on queries being similar to training examples

## Implementation Guidelines

The model assigns label 1 for relevant queries and label 0 for non-relevant queries. Implementation should account for this labeling scheme:

```python
def classify_query(text):
    # Tokenize input
    encoding = tokenizer(
        text,
        add_special_tokens=True,
        max_length=128,
        padding='max_length',
        truncation=True,
        return_attention_mask=True,
        return_tensors='pt'
    )
    
    # Get prediction
    model.eval()
    with torch.no_grad():
        outputs = model(
            input_ids=encoding['input_ids'].to(device),
            attention_mask=encoding['attention_mask'].to(device)
        )
        
        # Apply softmax to get probabilities
        probs = torch.nn.functional.softmax(outputs.logits, dim=1)[0]
        class_0_prob = probs[0].item()  # Not relevant probability
        class_1_prob = probs[1].item()  # Relevant probability
        
        # Simple threshold-based classification
        predicted_class = 1 if class_1_prob > 0.5 else 0
        
        # Optional: Enhanced classification with keyword verification
        laser_keywords = ["laser", "clean", "rust", "metal", "surface"]
        contains_keywords = any(keyword in text.lower() for keyword in laser_keywords)
        
        # Return classification result
        if predicted_class == 1 or contains_keywords:
            return "Relevant to laser cleaning"
        else:
            return "Not relevant to laser cleaning"
```

## Data Characteristics

The model was trained on a rich dataset containing:
- Queries about laser cleaning services, pricing, processes, and applications
- Questions about materials that can be laser cleaned (metals, industrial equipment, automotive parts)
- Service area inquiries related to Huntsville and Alabama
- Edge cases like general rust removal without mentioning laser
- Negative examples including:
  - General information requests unrelated to laser cleaning
  - Other cleaning-related queries that aren't laser-specific
  - Questions about completely different services and products

The dataset was systematically expanded through:
- Template-based generation with material/problem variations
- Compound questions combining multiple aspects of laser cleaning
- Paraphrasing of base examples
- Inclusion of carefully labeled ambiguous examples

## Ethical Considerations

- **False Negatives:** Important customer inquiries might be misclassified as irrelevant
- **Transparency:** Users should be informed if their queries are being automatically filtered
- **Human Oversight:** Regular auditing of model classifications is recommended
- **Bias:** Monitor for potential bias against certain query formulations or terminology

## Maintenance Recommendations

We recommend:
- Periodically retraining with new customer queries to capture evolving language patterns
- Monitoring performance metrics, especially on edge cases
- Adding any consistently misclassified queries to the training dataset
- Considering expansion to multi-class classification for more nuanced routing

## Contact Information

For issues, improvements, or questions about this model, please contact the RustBusters AI team.

---

*This model card follows best practices for AI documentation and transparency.*