dejanseo commited on
Commit
5521d65
·
verified ·
1 Parent(s): be05a04

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +200 -5
README.md CHANGED
@@ -1,5 +1,200 @@
1
- ---
2
- license: other
3
- license_name: link-attribution
4
- license_link: https://dejanmarketing.com/link-attribution/
5
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: other
3
+ license_name: link-attribution
4
+ license_link: https://dejan.ai/blog/query-length-vs-volume/
5
+ language:
6
+ - en
7
+ library_name: transformers
8
+ pipeline_tag: text-classification
9
+ tags:
10
+ - deberta-v2
11
+ - deberta-v3
12
+ - ecommerce
13
+ - search
14
+ - query-volume
15
+ - seo
16
+ - keyword-research
17
+ - amazon
18
+ base_model: microsoft/deberta-v3-base
19
+ datasets:
20
+ - amazon/AmazonQAC
21
+ metrics:
22
+ - accuracy
23
+ - f1
24
+ model-index:
25
+ - name: ecommerce-query-volume-classifier
26
+ results:
27
+ - task:
28
+ type: text-classification
29
+ name: Search Query Volume Classification
30
+ dataset:
31
+ name: Amazon Shopping Queries (AmazonQAC)
32
+ type: amazon/AmazonQAC
33
+ metrics:
34
+ - name: Accuracy
35
+ type: accuracy
36
+ value: 0.721
37
+ - name: Macro F1
38
+ type: f1
39
+ value: 0.6877
40
+ - name: Spearman Correlation
41
+ type: spearmanr
42
+ value: 0.896
43
+ ---
44
+
45
+ # eCommerce Query Volume Classifier
46
+
47
+ A fine-tuned [DeBERTa v3 base](https://huggingface.co/microsoft/deberta-v3-base) model that predicts the search volume class of ecommerce product queries. Trained on 39.6 million unique queries from the [Amazon Shopping Queries](https://huggingface.co/datasets/amazon/AmazonQAC) dataset spanning 395.5 million search sessions.
48
+
49
+ **Blog post:** [Is Query Length a Reliable Predictor of Search Volume?](https://dejan.ai/blog/query-length-vs-volume/)
50
+
51
+ ## Model Description
52
+
53
+ This model classifies ecommerce search queries into five volume tiers based on their expected search popularity:
54
+
55
+ | Label | Class | Occurrences | Description |
56
+ |-------|-------|-------------|-------------|
57
+ | 0 | `very_high` | 10,000+ | Head terms, major brands (e.g. "airpods", "laptop") |
58
+ | 1 | `high` | 1,000–9,999 | Popular product categories and well-known items |
59
+ | 2 | `medium` | 100–999 | Moderately specific queries |
60
+ | 3 | `low` | 10–99 | Niche or qualified queries |
61
+ | 4 | `very_low` | <10 | Long-tail, highly specific queries |
62
+
63
+ The model learns semantic signals — brand recognition, category head terms, specificity markers — rather than superficial features like query length. Simple character/word-count heuristics achieve only ~25% accuracy on this task (barely above the 20% random baseline), while this model achieves **72.1% accuracy**.
64
+
65
+ ## Usage
66
+
67
+ ```python
68
+ from transformers import AutoTokenizer, AutoModelForSequenceClassification
69
+ import torch
70
+
71
+ model_name = "dejanseo/ecommerce-query-volume-classifier"
72
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
73
+ model = AutoModelForSequenceClassification.from_pretrained(model_name)
74
+ model.eval()
75
+
76
+ labels = ["very_high", "high", "medium", "low", "very_low"]
77
+
78
+ queries = [
79
+ "airpods",
80
+ "wireless mouse",
81
+ "organic flurb capsules",
82
+ "replacement gasket for instant pot duo 8 quart",
83
+ ]
84
+
85
+ inputs = tokenizer(queries, return_tensors="pt", padding=True, truncation=True, max_length=32)
86
+
87
+ with torch.no_grad():
88
+ outputs = model(**inputs)
89
+ probs = torch.softmax(outputs.logits, dim=-1)
90
+ preds = torch.argmax(probs, dim=-1)
91
+
92
+ for query, pred, prob in zip(queries, preds, probs):
93
+ label = labels[pred.item()]
94
+ confidence = prob[pred.item()].item() * 100
95
+ print(f"{query:50s} → {label:>10s} ({confidence:.1f}%)")
96
+ ```
97
+
98
+ ## Performance
99
+
100
+ ### Evaluation (25K balanced sample, 5K per class)
101
+
102
+ | Method | Accuracy | Spearman ρ |
103
+ |--------|----------|------------|
104
+ | **This model** | **72.1%** | **0.896** |
105
+ | Word count heuristic | 25.4% | -0.345 |
106
+ | Char count heuristic | 24.9% | -0.336 |
107
+
108
+ ### Per-Class F1 Scores (best validation checkpoint)
109
+
110
+ | Class | Precision | Recall | F1 |
111
+ |-------|-----------|--------|----|
112
+ | very_high | 0.892 | 0.980 | 0.934 |
113
+ | high | 0.727 | 0.921 | 0.813 |
114
+ | medium | 0.625 | 0.790 | 0.698 |
115
+ | low | 0.496 | 0.335 | 0.400 |
116
+ | very_low | 0.610 | 0.579 | 0.594 |
117
+
118
+ The model performs best on the extremes (very high and very low volume) and struggles most with the `low` class, which sits in an ambiguous zone between `medium` and `very_low`.
119
+
120
+ ## Training Details
121
+
122
+ ### Hyperparameters
123
+
124
+ | Parameter | Value |
125
+ |-----------|-------|
126
+ | Base model | `microsoft/deberta-v3-base` |
127
+ | Epochs | 20 |
128
+ | Batch size | 128 |
129
+ | Learning rate | 3e-5 |
130
+ | Max sequence length | 32 |
131
+ | Warmup ratio | 0.1 |
132
+ | Weight decay | 0.01 |
133
+ | Label smoothing | 0.1 |
134
+ | Scheduler | Linear with warmup |
135
+
136
+ ### Sampling Strategy
137
+
138
+ Balanced sampling per epoch with different random seeds:
139
+
140
+ | Class | Samples per epoch |
141
+ |-------|-------------------|
142
+ | very_low | 100,000 |
143
+ | low | 100,000 |
144
+ | medium | 100,000 |
145
+ | high | 30,000 |
146
+ | very_high | 30,000 |
147
+
148
+ **Total per epoch:** 324,000 train / 36,000 validation
149
+
150
+ ### Hardware
151
+
152
+ - **GPU:** NVIDIA GeForce RTX 4090 (24 GB)
153
+ - **RAM:** 128 GB
154
+ - **OS:** Windows 11
155
+ - **Training time:** ~2 hours 16 minutes
156
+ - **Framework:** PyTorch + Transformers 4.57.1
157
+
158
+ ### Dataset
159
+
160
+ [Amazon Shopping Queries (AmazonQAC)](https://huggingface.co/datasets/amazon/AmazonQAC) �� 395.5 million sessions, 39.6 million unique queries. Volume classes derived from raw occurrence counts across sessions.
161
+
162
+ | Class | Unique Queries |
163
+ |-------|---------------|
164
+ | very_high | ~18K |
165
+ | high | ~30K |
166
+ | medium | ~321K |
167
+ | low | ~4.6M |
168
+ | very_low | ~34.7M |
169
+
170
+ ## What the Model Learns
171
+
172
+ The model captures semantic patterns rather than surface-level features like query length:
173
+
174
+ - **Brand recognition:** "airpods" → very high, regardless of character count
175
+ - **Category head terms:** "laptop", "headphones", "dog food" → recognized as high-volume entry points
176
+ - **Specificity markers:** Size specs, compatibility constraints, and material callouts signal niche demand
177
+ - **Nonsense detection:** Gibberish queries like "blorf" and "wireless blorf adapter" are correctly classified as very low volume, confirming the model isn't just counting characters
178
+
179
+ ## Limitations
180
+
181
+ - Trained exclusively on Amazon product search queries — may not generalize well to Google web search, informational queries, or non-English markets
182
+ - The `low` volume class is the weakest (F1 ≈ 0.39), reflecting genuine ambiguity in the boundary between medium and very low volume queries
183
+ - Volume thresholds are based on the Amazon QAC dataset's session counts, which may not map directly to other volume scales (e.g. Google Keyword Planner)
184
+ - Product trends shift over time; queries that were high volume in the training data may not remain so
185
+
186
+ ## Citation
187
+
188
+ ```bibtex
189
+ @article{petrovic2026querylength,
190
+ title={Is Query Length a Reliable Predictor of Search Volume?},
191
+ author={Petrovic, Dan},
192
+ year={2026},
193
+ month={March},
194
+ url={https://dejan.ai/blog/query-length-vs-volume/}
195
+ }
196
+ ```
197
+
198
+ ## Author
199
+
200
+ **Dan Petrovic** — [DEJAN AI](https://dejan.ai/)