--- language: - en license: mit library_name: transformers tags: - reranking - information-retrieval - pointwise - binary-cross-entropy - efficient - llama base_model: meta-llama/Llama-3.2-3B datasets: - Tevatron/msmarco-passage - abdoelsayed/DeAR-COT pipeline_tag: text-classification --- # DeAR-3B-Reranker-CE-v1 ## Model Description **DeAR-3B-Reranker-CE-v1** is a 3B parameter efficient neural reranker trained with Binary Cross-Entropy loss and knowledge distillation. This model provides fast, reliable reranking for production environments where speed and efficiency are critical. ## Model Details - **Model Type:** Pointwise Reranker (Binary Classification) - **Base Model:** LLaMA-3.2-3B - **Parameters:** 3 billion - **Training Method:** Knowledge Distillation + Binary Cross-Entropy - **Teacher Model:** [LLaMA2-13B-RankLLaMA](https://huggingface.co/abdoelsayed/llama2-13b-rankllama-teacher) - **Training Data:** MS MARCO + DeAR-COT - **Model Size:** 6GB (BF16) ## Key Features ✅ **Ultra Fast:** 1.5s inference (best in DeAR family) ✅ **Memory Efficient:** Runs on single 16GB GPU ✅ **Production Ready:** Stable training with BCE loss ✅ **Cost Effective:** Lower computational costs ✅ **Binary Classification:** Probabilistic relevance scores ## Usage ### Quick Start ```python import torch from transformers import AutoTokenizer, AutoModelForSequenceClassification # Load model model_path = "abdoelsayed/dear-3b-reranker-ce-v1" tokenizer = AutoTokenizer.from_pretrained(model_path) model = AutoModelForSequenceClassification.from_pretrained( model_path, torch_dtype=torch.bfloat16 ) model.eval().cuda() # Score a query-document pair query = "What is llama?" document = "The llama is a domesticated South American camelid..." inputs = tokenizer( f"query: {query}", f"document: {document}", return_tensors="pt", truncation=True, max_length=228, padding="max_length" ) inputs = {k: v.cuda() for k, v in inputs.items()} with torch.no_grad(): score = model(**inputs).logits.squeeze().item() print(f"Relevance score: {score}") ``` ### Efficient Batch Processing ```python import torch from typing import List, Tuple @torch.inference_mode() def fast_rerank(tokenizer, model, query: str, docs: List[Tuple[str, str]], batch_size: int = 128): """Fast reranking optimized for 3B model.""" device = next(model.parameters()).device scores = [] for i in range(0, len(docs), batch_size): batch = docs[i:i + batch_size] # Prepare batch queries = [f"query: {query}"] * len(batch) documents = [f"document: {t} {p}" for t, p in batch] # Tokenize inputs = tokenizer( queries, documents, return_tensors="pt", truncation=True, max_length=228, padding=True ) inputs = {k: v.to(device) for k, v in inputs.items()} # Score logits = model(**inputs).logits.squeeze(-1) scores.extend(logits.cpu().tolist()) # Rank return sorted(enumerate(scores), key=lambda x: x[1], reverse=True) # Example query = "When did Thomas Edison invent the light bulb?" docs = [ ("", "Thomas Edison invented the light bulb in 1879"), ("", "Coffee is good for diet"), ("", "Lightning strike at Seoul National University"), ] ranking = fast_rerank(tokenizer, model, query, docs, batch_size=128) print(ranking) # DeAR-P-3B-BC Output: # [(0, -6.0625), (2, -11.125), (1, -12.0625)] ``` ### Production Optimization ```python # Optimize for maximum throughput model = AutoModelForSequenceClassification.from_pretrained( "abdoelsayed/dear-3b-reranker-ce-v1", torch_dtype=torch.bfloat16, device_map="auto" ) model.eval() # Compile for 20-30% speedup (PyTorch 2.0+) if hasattr(torch, 'compile'): model = torch.compile(model, mode="max-autotune") # Use larger batches for throughput batch_size = 128 # 3B can handle larger batches ``` ## Training Details ### Training Configuration ```python { "base_model": "meta-llama/Llama-3.2-3B", "teacher_model": "abdoelsayed/llama2-13b-rankllama-teacher", "loss": "Binary Cross-Entropy", "distillation": { "temperature": 2.0, "alpha": 0.1 }, "learning_rate": 1e-4, "batch_size": 4, "gradient_accumulation": 2, "epochs": 2, "max_length": 228, "bf16": true } ``` ### Hardware - **GPUs:** 4x NVIDIA A100 (40GB) - **Training Time:** ~17 hours - **Memory Usage:** ~24GB per GPU - **Trainable Parameters:** 3B ## Evaluation Results ### TREC Deep Learning | Dataset | NDCG@10 | NDCG@20 | MRR@10 | |---------|---------|---------|--------| | DL19 | 70.8 | 67.3 | 83.9 | | DL20 | 68.9 | 65.8 | 81.7 | ### BEIR Benchmark | Dataset | NDCG@10 | |---------|---------| | MS MARCO | 65.3 | | NQ | 48.7 | | HotpotQA | 57.9 | | FiQA | 43.6 | | ArguAna | 55.8 | | SciFact | 70.2 | | TREC-COVID | 81.8 | | NFCorpus | 37.2 | | **Average** | **41.7** | ### Efficiency | Metric | 3B-CE | 8B-CE | Improvement | |--------|-------|-------|-------------| | Inference (100 docs) | 1.5s | 2.2s | **1.5x faster** | | Throughput | 67 docs/s | 45 docs/s | **1.5x** | | GPU Memory | 12GB | 18GB | **33% less** | | Model Size | 6GB | 16GB | **62% smaller** | ## Comparison ### vs. Other 3B Models | Model | Loss | DL19 | DL20 | Speed (s) | |-------|------|------|------|-----------| | **DeAR-3B-CE** | BCE | 70.8 | 68.9 | 1.5 | | DeAR-3B-RankNet | RankNet | 71.2 | 69.4 | 1.5 | | MonoT5-3B | - | 71.8 | 68.9 | 3.5 | **Key Advantages:** - 2.3x faster than MonoT5-3B - Comparable accuracy - More stable training (BCE vs complex losses) ## When to Use **Best for:** - ✅ High-throughput production systems - ✅ Real-time applications (latency <2s) - ✅ Cost-sensitive deployments - ✅ Edge deployment (smaller GPUs) - ✅ Binary relevance tasks **Consider alternatives for:** - ❌ Maximum accuracy (use 8B models) - ❌ Complex reasoning queries (use listwise) - ❌ Unlimited compute budget ## Deployment Examples ### REST API Server ```python from fastapi import FastAPI from pydantic import BaseModel import torch app = FastAPI() # Load model once at startup tokenizer, model = None, None @app.on_event("startup") async def load_model(): global tokenizer, model tokenizer = AutoTokenizer.from_pretrained("abdoelsayed/dear-3b-reranker-ce-v1") model = AutoModelForSequenceClassification.from_pretrained( "abdoelsayed/dear-3b-reranker-ce-v1", torch_dtype=torch.bfloat16, device_map="auto" ) model.eval() if hasattr(torch, 'compile'): model = torch.compile(model) class RerankRequest(BaseModel): query: str documents: List[str] @app.post("/rerank") async def rerank(request: RerankRequest): ranking = fast_rerank(tokenizer, model, request.query, [(""doc) for doc in request.documents]) return {"ranking": ranking} ``` ### Batch Processing Script ```python import pandas as pd from tqdm import tqdm # Load queries and documents df = pd.read_csv("queries_docs.csv") results = [] for _, row in tqdm(df.iterrows()): ranking = fast_rerank(tokenizer, model, row['query'], row['documents']) results.append({ 'query_id': row['query_id'], 'ranking': ranking }) # Save results pd.DataFrame(results).to_csv("reranked.csv") ``` ## Model Architecture ``` Input: "query: [Q] [SEP] document: [D]" ↓ LLaMA-3.2-3B (24 layers, 3072 hidden) ↓ [CLS] Token Pooling ↓ Linear(3072 → 1) ↓ Binary Relevance Score ``` ## Limitations 1. **Accuracy:** ~3-4 NDCG@10 lower than 8B models 2. **Complex Queries:** May miss subtle nuances 3. **Document Length:** Limited to 196 tokens 4. **Language:** English only 5. **Domain:** Optimized for web documents ## Related Models **DeAR 3B Family:** - [DeAR-3B-RankNet](https://huggingface.co/abdoelsayed/dear-3b-reranker-ranknet-v1) - RankNet variant (slightly better) - [DeAR-3B-CE-LoRA](https://huggingface.co/abdoelsayed/dear-3b-reranker-ce-lora-v1) - LoRA adapter **Larger Models:** - [DeAR-8B-CE](https://huggingface.co/abdoelsayed/dear-8b-reranker-ce-v1) - Higher accuracy **Resources:** - [Teacher Model](https://huggingface.co/abdoelsayed/llama2-13b-rankllama-teacher) - [DeAR-COT Dataset](https://huggingface.co/datasets/abdoelsayed/DeAR-COT) ## Citation ```bibtex @article{abdallah2025dear, title={DeAR: Dual-Stage Document Reranking with Reasoning Agents via LLM Distillation}, author={Abdallah, Abdelrahman and Mozafari, Jamshid and Piryani, Bhawna and Jatowt, Adam}, journal={arXiv preprint arXiv:2508.16998}, year={2025} } ``` ## License MIT License ## More Information - **GitHub:** [DataScienceUIBK/DeAR-Reranking](https://github.com/DataScienceUIBK/DeAR-Reranking) - **Paper:** [arXiv:2508.16998](https://arxiv.org/abs/2508.16998) - **Collection:** [DeAR Models](https://huggingface.co/collections/abdoelsayed/dear-reranking)