abdoelsayed commited on
Commit
70d8365
Β·
verified Β·
1 Parent(s): ba36055

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +355 -0
README.md ADDED
@@ -0,0 +1,355 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ license: mit
5
+ library_name: transformers
6
+ tags:
7
+ - reranking
8
+ - information-retrieval
9
+ - pointwise
10
+ - binary-cross-entropy
11
+ - efficient
12
+ - llama
13
+ base_model: meta-llama/Llama-3.2-3B
14
+ datasets:
15
+ - Tevatron/msmarco-passage
16
+ - abdoelsayed/DeAR-COT
17
+ pipeline_tag: text-classification
18
+ ---
19
+
20
+ # DeAR-3B-Reranker-CE-v1
21
+
22
+ ## Model Description
23
+
24
+ **DeAR-3B-Reranker-CE-v1** is a 3B parameter efficient neural reranker trained with Binary Cross-Entropy loss and knowledge distillation. This model provides fast, reliable reranking for production environments where speed and efficiency are critical.
25
+
26
+ ## Model Details
27
+
28
+ - **Model Type:** Pointwise Reranker (Binary Classification)
29
+ - **Base Model:** LLaMA-3.2-3B
30
+ - **Parameters:** 3 billion
31
+ - **Training Method:** Knowledge Distillation + Binary Cross-Entropy
32
+ - **Teacher Model:** [LLaMA2-13B-RankLLaMA](https://huggingface.co/abdoelsayed/llama2-13b-rankllama-teacher)
33
+ - **Training Data:** MS MARCO + DeAR-COT
34
+ - **Model Size:** 6GB (BF16)
35
+
36
+ ## Key Features
37
+
38
+ βœ… **Ultra Fast:** 1.5s inference (best in DeAR family)
39
+ βœ… **Memory Efficient:** Runs on single 16GB GPU
40
+ βœ… **Production Ready:** Stable training with BCE loss
41
+ βœ… **Cost Effective:** Lower computational costs
42
+ βœ… **Binary Classification:** Probabilistic relevance scores
43
+
44
+
45
+
46
+ ## Usage
47
+
48
+ ### Quick Start
49
+
50
+ ```python
51
+ import torch
52
+ from transformers import AutoTokenizer, AutoModelForSequenceClassification
53
+
54
+ # Load model
55
+ model_path = "abdoelsayed/dear-3b-reranker-ce-v1"
56
+ tokenizer = AutoTokenizer.from_pretrained(model_path)
57
+ model = AutoModelForSequenceClassification.from_pretrained(
58
+ model_path,
59
+ torch_dtype=torch.bfloat16
60
+ )
61
+ model.eval().cuda()
62
+
63
+ # Score a query-document pair
64
+ query = "What is llama?"
65
+ document = "The llama is a domesticated South American camelid..."
66
+
67
+ inputs = tokenizer(
68
+ f"query: {query}",
69
+ f"document: {document}",
70
+ return_tensors="pt",
71
+ truncation=True,
72
+ max_length=228,
73
+ padding="max_length"
74
+ )
75
+ inputs = {k: v.cuda() for k, v in inputs.items()}
76
+
77
+ with torch.no_grad():
78
+ score = model(**inputs).logits.squeeze().item()
79
+
80
+ print(f"Relevance score: {score}")
81
+ ```
82
+
83
+ ### Efficient Batch Processing
84
+
85
+ ```python
86
+ import torch
87
+ from typing import List, Tuple
88
+
89
+ @torch.inference_mode()
90
+ def fast_rerank(tokenizer, model, query: str, docs: List[Tuple[str, str]], batch_size: int = 128):
91
+ """Fast reranking optimized for 3B model."""
92
+ device = next(model.parameters()).device
93
+ scores = []
94
+
95
+ for i in range(0, len(docs), batch_size):
96
+ batch = docs[i:i + batch_size]
97
+
98
+ # Prepare batch
99
+ queries = [f"query: {query}"] * len(batch)
100
+ documents = [f"document: {t} {p}" for t, p in batch]
101
+
102
+ # Tokenize
103
+ inputs = tokenizer(
104
+ queries,
105
+ documents,
106
+ return_tensors="pt",
107
+ truncation=True,
108
+ max_length=228,
109
+ padding=True
110
+ )
111
+ inputs = {k: v.to(device) for k, v in inputs.items()}
112
+
113
+ # Score
114
+ logits = model(**inputs).logits.squeeze(-1)
115
+ scores.extend(logits.cpu().tolist())
116
+
117
+ # Rank
118
+ return sorted(enumerate(scores), key=lambda x: x[1], reverse=True)
119
+
120
+
121
+ # Example
122
+ query = "When did Thomas Edison invent the light bulb?"
123
+ docs = [
124
+ ("", "Thomas Edison invented the light bulb in 1879"),
125
+ ("", "Coffee is good for diet"),
126
+ ("", "Lightning strike at Seoul National University"),
127
+ ]
128
+
129
+ ranking = fast_rerank(tokenizer, model, query, docs, batch_size=128)
130
+ print(ranking)
131
+ # DeAR-P-3B-BC Output:
132
+ # [(0, -6.0625), (2, -11.125), (1, -12.0625)]
133
+ ```
134
+
135
+ ### Production Optimization
136
+
137
+ ```python
138
+ # Optimize for maximum throughput
139
+ model = AutoModelForSequenceClassification.from_pretrained(
140
+ "abdoelsayed/dear-3b-reranker-ce-v1",
141
+ torch_dtype=torch.bfloat16,
142
+ device_map="auto"
143
+ )
144
+ model.eval()
145
+
146
+ # Compile for 20-30% speedup (PyTorch 2.0+)
147
+ if hasattr(torch, 'compile'):
148
+ model = torch.compile(model, mode="max-autotune")
149
+
150
+ # Use larger batches for throughput
151
+ batch_size = 128 # 3B can handle larger batches
152
+ ```
153
+
154
+ ## Training Details
155
+
156
+ ### Training Configuration
157
+ ```python
158
+ {
159
+ "base_model": "meta-llama/Llama-3.2-3B",
160
+ "teacher_model": "abdoelsayed/llama2-13b-rankllama-teacher",
161
+ "loss": "Binary Cross-Entropy",
162
+ "distillation": {
163
+ "temperature": 2.0,
164
+ "alpha": 0.1
165
+ },
166
+ "learning_rate": 1e-4,
167
+ "batch_size": 4,
168
+ "gradient_accumulation": 2,
169
+ "epochs": 2,
170
+ "max_length": 228,
171
+ "bf16": true
172
+ }
173
+ ```
174
+
175
+ ### Hardware
176
+ - **GPUs:** 4x NVIDIA A100 (40GB)
177
+ - **Training Time:** ~17 hours
178
+ - **Memory Usage:** ~24GB per GPU
179
+ - **Trainable Parameters:** 3B
180
+
181
+ ## Evaluation Results
182
+
183
+ ### TREC Deep Learning
184
+
185
+ | Dataset | NDCG@10 | NDCG@20 | MRR@10 |
186
+ |---------|---------|---------|--------|
187
+ | DL19 | 70.8 | 67.3 | 83.9 |
188
+ | DL20 | 68.9 | 65.8 | 81.7 |
189
+
190
+ ### BEIR Benchmark
191
+
192
+ | Dataset | NDCG@10 |
193
+ |---------|---------|
194
+ | MS MARCO | 65.3 |
195
+ | NQ | 48.7 |
196
+ | HotpotQA | 57.9 |
197
+ | FiQA | 43.6 |
198
+ | ArguAna | 55.8 |
199
+ | SciFact | 70.2 |
200
+ | TREC-COVID | 81.8 |
201
+ | NFCorpus | 37.2 |
202
+ | **Average** | **41.7** |
203
+
204
+ ### Efficiency
205
+
206
+ | Metric | 3B-CE | 8B-CE | Improvement |
207
+ |--------|-------|-------|-------------|
208
+ | Inference (100 docs) | 1.5s | 2.2s | **1.5x faster** |
209
+ | Throughput | 67 docs/s | 45 docs/s | **1.5x** |
210
+ | GPU Memory | 12GB | 18GB | **33% less** |
211
+ | Model Size | 6GB | 16GB | **62% smaller** |
212
+
213
+ ## Comparison
214
+
215
+ ### vs. Other 3B Models
216
+
217
+ | Model | Loss | DL19 | DL20 | Speed (s) |
218
+ |-------|------|------|------|-----------|
219
+ | **DeAR-3B-CE** | BCE | 70.8 | 68.9 | 1.5 |
220
+ | DeAR-3B-RankNet | RankNet | 71.2 | 69.4 | 1.5 |
221
+ | MonoT5-3B | - | 71.8 | 68.9 | 3.5 |
222
+
223
+ **Key Advantages:**
224
+ - 2.3x faster than MonoT5-3B
225
+ - Comparable accuracy
226
+ - More stable training (BCE vs complex losses)
227
+
228
+ ## When to Use
229
+
230
+ **Best for:**
231
+ - βœ… High-throughput production systems
232
+ - βœ… Real-time applications (latency <2s)
233
+ - βœ… Cost-sensitive deployments
234
+ - βœ… Edge deployment (smaller GPUs)
235
+ - βœ… Binary relevance tasks
236
+
237
+ **Consider alternatives for:**
238
+ - ❌ Maximum accuracy (use 8B models)
239
+ - ❌ Complex reasoning queries (use listwise)
240
+ - ❌ Unlimited compute budget
241
+
242
+ ## Deployment Examples
243
+
244
+ ### REST API Server
245
+
246
+ ```python
247
+ from fastapi import FastAPI
248
+ from pydantic import BaseModel
249
+ import torch
250
+
251
+ app = FastAPI()
252
+
253
+ # Load model once at startup
254
+ tokenizer, model = None, None
255
+
256
+ @app.on_event("startup")
257
+ async def load_model():
258
+ global tokenizer, model
259
+ tokenizer = AutoTokenizer.from_pretrained("abdoelsayed/dear-3b-reranker-ce-v1")
260
+ model = AutoModelForSequenceClassification.from_pretrained(
261
+ "abdoelsayed/dear-3b-reranker-ce-v1",
262
+ torch_dtype=torch.bfloat16,
263
+ device_map="auto"
264
+ )
265
+ model.eval()
266
+ if hasattr(torch, 'compile'):
267
+ model = torch.compile(model)
268
+
269
+ class RerankRequest(BaseModel):
270
+ query: str
271
+ documents: List[str]
272
+
273
+ @app.post("/rerank")
274
+ async def rerank(request: RerankRequest):
275
+ ranking = fast_rerank(tokenizer, model, request.query,
276
+ [(""doc) for doc in request.documents])
277
+ return {"ranking": ranking}
278
+ ```
279
+
280
+ ### Batch Processing Script
281
+
282
+ ```python
283
+ import pandas as pd
284
+ from tqdm import tqdm
285
+
286
+ # Load queries and documents
287
+ df = pd.read_csv("queries_docs.csv")
288
+
289
+ results = []
290
+ for _, row in tqdm(df.iterrows()):
291
+ ranking = fast_rerank(tokenizer, model, row['query'], row['documents'])
292
+ results.append({
293
+ 'query_id': row['query_id'],
294
+ 'ranking': ranking
295
+ })
296
+
297
+ # Save results
298
+ pd.DataFrame(results).to_csv("reranked.csv")
299
+ ```
300
+
301
+ ## Model Architecture
302
+
303
+ ```
304
+ Input: "query: [Q] [SEP] document: [D]"
305
+ ↓
306
+ LLaMA-3.2-3B (24 layers, 3072 hidden)
307
+ ↓
308
+ [CLS] Token Pooling
309
+ ↓
310
+ Linear(3072 β†’ 1)
311
+ ↓
312
+ Binary Relevance Score
313
+ ```
314
+
315
+ ## Limitations
316
+
317
+ 1. **Accuracy:** ~3-4 NDCG@10 lower than 8B models
318
+ 2. **Complex Queries:** May miss subtle nuances
319
+ 3. **Document Length:** Limited to 196 tokens
320
+ 4. **Language:** English only
321
+ 5. **Domain:** Optimized for web documents
322
+
323
+ ## Related Models
324
+
325
+ **DeAR 3B Family:**
326
+ - [DeAR-3B-RankNet](https://huggingface.co/abdoelsayed/dear-3b-reranker-ranknet-v1) - RankNet variant (slightly better)
327
+ - [DeAR-3B-CE-LoRA](https://huggingface.co/abdoelsayed/dear-3b-reranker-ce-lora-v1) - LoRA adapter
328
+
329
+ **Larger Models:**
330
+ - [DeAR-8B-CE](https://huggingface.co/abdoelsayed/dear-8b-reranker-ce-v1) - Higher accuracy
331
+
332
+ **Resources:**
333
+ - [Teacher Model](https://huggingface.co/abdoelsayed/llama2-13b-rankllama-teacher)
334
+ - [DeAR-COT Dataset](https://huggingface.co/datasets/abdoelsayed/DeAR-COT)
335
+
336
+ ## Citation
337
+
338
+ ```bibtex
339
+ @article{abdallah2025dear,
340
+ title={DeAR: Dual-Stage Document Reranking with Reasoning Agents via LLM Distillation},
341
+ author={Abdallah, Abdelrahman and Mozafari, Jamshid and Piryani, Bhawna and Jatowt, Adam},
342
+ journal={arXiv preprint arXiv:2508.16998},
343
+ year={2025}
344
+ }
345
+ ```
346
+
347
+ ## License
348
+
349
+ MIT License
350
+
351
+ ## More Information
352
+
353
+ - **GitHub:** [DataScienceUIBK/DeAR-Reranking](https://github.com/DataScienceUIBK/DeAR-Reranking)
354
+ - **Paper:** [arXiv:2508.16998](https://arxiv.org/abs/2508.16998)
355
+ - **Collection:** [DeAR Models](https://huggingface.co/collections/abdoelsayed/dear-reranking)