Dudeman523 commited on
Commit
32616d3
·
verified ·
1 Parent(s): b75a2cd

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +148 -0
README.md ADDED
@@ -0,0 +1,148 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ language:
4
+ - en
5
+ base_model:
6
+ - google-bert/bert-base-uncased
7
+ ---
8
+ # RustBusters BERT Relevance Assessment Model Card
9
+
10
+ ## Model Description
11
+
12
+ **Model Name:** RustBusters-BERT-Relevance-Classifier
13
+ **Base Model:** bert-base-uncased
14
+ **Architecture:** BERT (Bidirectional Encoder Representations from Transformers)
15
+ **Task:** Binary Text Classification
16
+ **Version:** 1.0
17
+ **Last Updated:** March 2025
18
+
19
+ ## Intended Use
20
+
21
+ This model is designed to classify incoming customer queries as either relevant or not relevant to laser cleaning services. The model serves as a first-line filter to:
22
+
23
+ - Identify queries related to laser cleaning that should be routed to RustBusters' customer service
24
+ - Filter out unrelated queries to improve response efficiency
25
+ - Help automate initial query triage in customer service workflows
26
+ - Support chatbots and digital assistants in determining when to engage with laser cleaning queries
27
+
28
+ ## Training Details
29
+
30
+ - **Base Model:** bert-base-uncased (110M parameters)
31
+ - **Training Data:** 714 examples (571 training, 143 testing)
32
+ - Positive examples (relevant to laser cleaning): 557 (78%)
33
+ - Negative examples (not relevant to laser cleaning): 157 (22%)
34
+ - **Training Method:** Fine-tuning with AdamW optimizer
35
+ - **Training Parameters:**
36
+ - Learning rate: 2e-5
37
+ - Batch size: 16
38
+ - Epochs: 3
39
+ - Sequence length: 128 tokens
40
+ - **Performance:**
41
+ - Final accuracy: 95.8%
42
+ - Precision for relevant class: 0.97
43
+ - Recall for relevant class: 0.97
44
+ - F1-score for relevant class: 0.97
45
+ - Precision for non-relevant class: 0.90
46
+ - Recall for non-relevant class: 0.90
47
+ - F1-score for non-relevant class: 0.90
48
+
49
+ ## Performance and Limitations
50
+
51
+ - **Strengths:**
52
+ - High accuracy (95.8%) on test set
53
+ - Well-balanced precision and recall for both classes
54
+ - Effective at identifying laser cleaning related queries
55
+ - Small model size, efficient for deployment
56
+ - Fast inference times
57
+
58
+ - **Limitations:**
59
+ - Limited to binary classification (relevant vs. not relevant)
60
+ - May struggle with highly ambiguous queries
61
+ - Cannot categorize queries by type, urgency, or complexity
62
+ - Limited exposure to industry-specific terminology beyond training data
63
+ - Performance dependent on queries being similar to training examples
64
+
65
+ ## Implementation Guidelines
66
+
67
+ The model assigns label 1 for relevant queries and label 0 for non-relevant queries. Implementation should account for this labeling scheme:
68
+
69
+ ```python
70
+ def classify_query(text):
71
+ # Tokenize input
72
+ encoding = tokenizer(
73
+ text,
74
+ add_special_tokens=True,
75
+ max_length=128,
76
+ padding='max_length',
77
+ truncation=True,
78
+ return_attention_mask=True,
79
+ return_tensors='pt'
80
+ )
81
+
82
+ # Get prediction
83
+ model.eval()
84
+ with torch.no_grad():
85
+ outputs = model(
86
+ input_ids=encoding['input_ids'].to(device),
87
+ attention_mask=encoding['attention_mask'].to(device)
88
+ )
89
+
90
+ # Apply softmax to get probabilities
91
+ probs = torch.nn.functional.softmax(outputs.logits, dim=1)[0]
92
+ class_0_prob = probs[0].item() # Not relevant probability
93
+ class_1_prob = probs[1].item() # Relevant probability
94
+
95
+ # Simple threshold-based classification
96
+ predicted_class = 1 if class_1_prob > 0.5 else 0
97
+
98
+ # Optional: Enhanced classification with keyword verification
99
+ laser_keywords = ["laser", "clean", "rust", "metal", "surface"]
100
+ contains_keywords = any(keyword in text.lower() for keyword in laser_keywords)
101
+
102
+ # Return classification result
103
+ if predicted_class == 1 or contains_keywords:
104
+ return "Relevant to laser cleaning"
105
+ else:
106
+ return "Not relevant to laser cleaning"
107
+ ```
108
+
109
+ ## Data Characteristics
110
+
111
+ The model was trained on a rich dataset containing:
112
+ - Queries about laser cleaning services, pricing, processes, and applications
113
+ - Questions about materials that can be laser cleaned (metals, industrial equipment, automotive parts)
114
+ - Service area inquiries related to Huntsville and Alabama
115
+ - Edge cases like general rust removal without mentioning laser
116
+ - Negative examples including:
117
+ - General information requests unrelated to laser cleaning
118
+ - Other cleaning-related queries that aren't laser-specific
119
+ - Questions about completely different services and products
120
+
121
+ The dataset was systematically expanded through:
122
+ - Template-based generation with material/problem variations
123
+ - Compound questions combining multiple aspects of laser cleaning
124
+ - Paraphrasing of base examples
125
+ - Inclusion of carefully labeled ambiguous examples
126
+
127
+ ## Ethical Considerations
128
+
129
+ - **False Negatives:** Important customer inquiries might be misclassified as irrelevant
130
+ - **Transparency:** Users should be informed if their queries are being automatically filtered
131
+ - **Human Oversight:** Regular auditing of model classifications is recommended
132
+ - **Bias:** Monitor for potential bias against certain query formulations or terminology
133
+
134
+ ## Maintenance Recommendations
135
+
136
+ We recommend:
137
+ - Periodically retraining with new customer queries to capture evolving language patterns
138
+ - Monitoring performance metrics, especially on edge cases
139
+ - Adding any consistently misclassified queries to the training dataset
140
+ - Considering expansion to multi-class classification for more nuanced routing
141
+
142
+ ## Contact Information
143
+
144
+ For issues, improvements, or questions about this model, please contact the RustBusters AI team.
145
+
146
+ ---
147
+
148
+ *This model card follows best practices for AI documentation and transparency.*