abdoelsayed
/

dear-3b-reranker-ce-v1

+---
+language:
+- en
+license: mit
+library_name: transformers
+tags:
+- reranking
+- information-retrieval
+- pointwise
+- binary-cross-entropy
+- efficient
+- llama
+base_model: meta-llama/Llama-3.2-3B
+datasets:
+- Tevatron/msmarco-passage
+- abdoelsayed/DeAR-COT
+pipeline_tag: text-classification
+---
+# DeAR-3B-Reranker-CE-v1
+## Model Description
+**DeAR-3B-Reranker-CE-v1** is a 3B parameter efficient neural reranker trained with Binary Cross-Entropy loss and knowledge distillation. This model provides fast, reliable reranking for production environments where speed and efficiency are critical.
+## Model Details
+- **Model Type:** Pointwise Reranker (Binary Classification)
+- **Base Model:** LLaMA-3.2-3B
+- **Parameters:** 3 billion
+- **Training Method:** Knowledge Distillation + Binary Cross-Entropy
+- **Teacher Model:** [LLaMA2-13B-RankLLaMA](https://huggingface.co/abdoelsayed/llama2-13b-rankllama-teacher)
+- **Training Data:** MS MARCO + DeAR-COT
+- **Model Size:** 6GB (BF16)
+## Key Features
+✅ **Ultra Fast:** 1.5s inference (best in DeAR family)
+✅ **Memory Efficient:** Runs on single 16GB GPU
+✅ **Production Ready:** Stable training with BCE loss
+✅ **Cost Effective:** Lower computational costs
+✅ **Binary Classification:** Probabilistic relevance scores
+## Usage
+### Quick Start
+```python
+import torch
+from transformers import AutoTokenizer, AutoModelForSequenceClassification
+# Load model
+model_path = "abdoelsayed/dear-3b-reranker-ce-v1"
+tokenizer = AutoTokenizer.from_pretrained(model_path)
+model = AutoModelForSequenceClassification.from_pretrained(
+    model_path,
+    torch_dtype=torch.bfloat16
+)
+model.eval().cuda()
+# Score a query-document pair
+query = "What is llama?"
+document = "The llama is a domesticated South American camelid..."
+inputs = tokenizer(
+    f"query: {query}",
+    f"document: {document}",
+    return_tensors="pt",
+    truncation=True,
+    max_length=228,
+    padding="max_length"
+)
+inputs = {k: v.cuda() for k, v in inputs.items()}
+with torch.no_grad():
+    score = model(**inputs).logits.squeeze().item()
+print(f"Relevance score: {score}")
+```
+### Efficient Batch Processing
+```python
+import torch
+from typing import List, Tuple
+@torch.inference_mode()
+def fast_rerank(tokenizer, model, query: str, docs: List[Tuple[str, str]], batch_size: int = 128):
+    """Fast reranking optimized for 3B model."""
+    device = next(model.parameters()).device
+    scores = []
+    for i in range(0, len(docs), batch_size):
+        batch = docs[i:i + batch_size]
+        # Prepare batch
+        queries = [f"query: {query}"] * len(batch)
+        documents = [f"document: {t} {p}" for t, p in batch]
+        # Tokenize
+        inputs = tokenizer(
+            queries,
+            documents,
+            return_tensors="pt",
+            truncation=True,
+            max_length=228,
+            padding=True
+        )
+        inputs = {k: v.to(device) for k, v in inputs.items()}
+        # Score
+        logits = model(**inputs).logits.squeeze(-1)
+        scores.extend(logits.cpu().tolist())
+    # Rank
+    return sorted(enumerate(scores), key=lambda x: x[1], reverse=True)
+# Example
+query = "When did Thomas Edison invent the light bulb?"
+docs = [
+    ("", "Thomas Edison invented the light bulb in 1879"),
+    ("", "Coffee is good for diet"),
+    ("", "Lightning strike at Seoul National University"),
+]
+ranking = fast_rerank(tokenizer, model, query, docs, batch_size=128)
+print(ranking)
+# DeAR-P-3B-BC Output:
+# [(0, -6.0625), (2, -11.125), (1, -12.0625)]
+```
+### Production Optimization
+```python
+# Optimize for maximum throughput
+model = AutoModelForSequenceClassification.from_pretrained(
+    "abdoelsayed/dear-3b-reranker-ce-v1",
+    torch_dtype=torch.bfloat16,
+    device_map="auto"
+)
+model.eval()
+# Compile for 20-30% speedup (PyTorch 2.0+)
+if hasattr(torch, 'compile'):
+    model = torch.compile(model, mode="max-autotune")
+# Use larger batches for throughput
+batch_size = 128  # 3B can handle larger batches
+```
+## Training Details
+### Training Configuration
+```python
+{
+    "base_model": "meta-llama/Llama-3.2-3B",
+    "teacher_model": "abdoelsayed/llama2-13b-rankllama-teacher",
+    "loss": "Binary Cross-Entropy",
+    "distillation": {
+        "temperature": 2.0,
+        "alpha": 0.1
+    },
+    "learning_rate": 1e-4,
+    "batch_size": 4,
+    "gradient_accumulation": 2,
+    "epochs": 2,
+    "max_length": 228,
+    "bf16": true
+}
+```
+### Hardware
+- **GPUs:** 4x NVIDIA A100 (40GB)
+- **Training Time:** ~17 hours
+- **Memory Usage:** ~24GB per GPU
+- **Trainable Parameters:** 3B
+## Evaluation Results
+### TREC Deep Learning
+| Dataset | NDCG@10 | NDCG@20 | MRR@10 |
+|---------|---------|---------|--------|
+| DL19 | 70.8 | 67.3 | 83.9 |
+| DL20 | 68.9 | 65.8 | 81.7 |
+### BEIR Benchmark
+| Dataset | NDCG@10 |
+|---------|---------|
+| MS MARCO | 65.3 |
+| NQ | 48.7 |
+| HotpotQA | 57.9 |
+| FiQA | 43.6 |
+| ArguAna | 55.8 |
+| SciFact | 70.2 |
+| TREC-COVID | 81.8 |
+| NFCorpus | 37.2 |
+| **Average** | **41.7** |
+### Efficiency
+| Metric | 3B-CE | 8B-CE | Improvement |
+|--------|-------|-------|-------------|
+| Inference (100 docs) | 1.5s | 2.2s | **1.5x faster** |
+| Throughput | 67 docs/s | 45 docs/s | **1.5x** |
+| GPU Memory | 12GB | 18GB | **33% less** |
+| Model Size | 6GB | 16GB | **62% smaller** |
+## Comparison
+### vs. Other 3B Models
+| Model | Loss | DL19 | DL20 | Speed (s) |
+|-------|------|------|------|-----------|
+| **DeAR-3B-CE** | BCE | 70.8 | 68.9 | 1.5 |
+| DeAR-3B-RankNet | RankNet | 71.2 | 69.4 | 1.5 |
+| MonoT5-3B | - | 71.8 | 68.9 | 3.5 |
+**Key Advantages:**
+- 2.3x faster than MonoT5-3B
+- Comparable accuracy
+- More stable training (BCE vs complex losses)
+## When to Use
+**Best for:**
+- ✅ High-throughput production systems
+- ✅ Real-time applications (latency <2s)
+- ✅ Cost-sensitive deployments
+- ✅ Edge deployment (smaller GPUs)
+- ✅ Binary relevance tasks
+**Consider alternatives for:**
+- ❌ Maximum accuracy (use 8B models)
+- ❌ Complex reasoning queries (use listwise)
+- ❌ Unlimited compute budget
+## Deployment Examples
+### REST API Server
+```python
+from fastapi import FastAPI
+from pydantic import BaseModel
+import torch
+app = FastAPI()
+# Load model once at startup
+tokenizer, model = None, None
+@app.on_event("startup")
+async def load_model():
+    global tokenizer, model
+    tokenizer = AutoTokenizer.from_pretrained("abdoelsayed/dear-3b-reranker-ce-v1")
+    model = AutoModelForSequenceClassification.from_pretrained(
+        "abdoelsayed/dear-3b-reranker-ce-v1",
+        torch_dtype=torch.bfloat16,
+        device_map="auto"
+    )
+    model.eval()
+    if hasattr(torch, 'compile'):
+        model = torch.compile(model)
+class RerankRequest(BaseModel):
+    query: str
+    documents: List[str]
+@app.post("/rerank")
+async def rerank(request: RerankRequest):
+    ranking = fast_rerank(tokenizer, model, request.query,
+                         [(""doc) for doc in request.documents])
+    return {"ranking": ranking}
+```
+### Batch Processing Script
+```python
+import pandas as pd
+from tqdm import tqdm
+# Load queries and documents
+df = pd.read_csv("queries_docs.csv")
+results = []
+for _, row in tqdm(df.iterrows()):
+    ranking = fast_rerank(tokenizer, model, row['query'], row['documents'])
+    results.append({
+        'query_id': row['query_id'],
+        'ranking': ranking
+    })
+# Save results
+pd.DataFrame(results).to_csv("reranked.csv")
+```
+## Model Architecture
+```
+Input: "query: [Q] [SEP] document: [D]"
+    ↓
+LLaMA-3.2-3B (24 layers, 3072 hidden)
+    ↓
+[CLS] Token Pooling
+    ↓
+Linear(3072 → 1)
+    ↓
+Binary Relevance Score
+```
+## Limitations
+1. **Accuracy:** ~3-4 NDCG@10 lower than 8B models
+2. **Complex Queries:** May miss subtle nuances
+3. **Document Length:** Limited to 196 tokens
+4. **Language:** English only
+5. **Domain:** Optimized for web documents
+## Related Models
+**DeAR 3B Family:**
+- [DeAR-3B-RankNet](https://huggingface.co/abdoelsayed/dear-3b-reranker-ranknet-v1) - RankNet variant (slightly better)
+- [DeAR-3B-CE-LoRA](https://huggingface.co/abdoelsayed/dear-3b-reranker-ce-lora-v1) - LoRA adapter
+**Larger Models:**
+- [DeAR-8B-CE](https://huggingface.co/abdoelsayed/dear-8b-reranker-ce-v1) - Higher accuracy
+**Resources:**
+- [Teacher Model](https://huggingface.co/abdoelsayed/llama2-13b-rankllama-teacher)
+- [DeAR-COT Dataset](https://huggingface.co/datasets/abdoelsayed/DeAR-COT)
+## Citation
+```bibtex
+@article{abdallah2025dear,
+  title={DeAR: Dual-Stage Document Reranking with Reasoning Agents via LLM Distillation},
+  author={Abdallah, Abdelrahman and Mozafari, Jamshid and Piryani, Bhawna and Jatowt, Adam},
+  journal={arXiv preprint arXiv:2508.16998},
+  year={2025}
+}
+```
+## License
+MIT License
+## More Information
+- **GitHub:** [DataScienceUIBK/DeAR-Reranking](https://github.com/DataScienceUIBK/DeAR-Reranking)
+- **Paper:** [arXiv:2508.16998](https://arxiv.org/abs/2508.16998)
+- **Collection:** [DeAR Models](https://huggingface.co/collections/abdoelsayed/dear-reranking)