--- library_name: transformers datasets: - s-nlp/EverGreen-Multilingual language: - ru - en - fr - de - he - ar - zh base_model: - intfloat/multilingual-e5-large-instruct pipeline_tag: text-classification --- # E5-EG-large A lightweight multilingual model for temporal classification of questions, fine-tuned from [intfloat/multilingual-e5-large-instruct](https://huggingface.co/intfloat/multilingual-e5-large-instruct). ## Model Details ### Model Description E5-EG-small (E5 EverGreen - Large) is an efficient multilingual text classification model that determines whether questions have temporally mutable or immutable answers. This model offers a balanced trade-off between performance and computational efficiency. - **Model type:** Text Classification - **Base model:** [intfloat/multilingual-e5-small](https://huggingface.co/intfloat/multilingual-e5-large-instruct) - **Language(s):** Russian, English, French, German, Hebrew, Arabic, Chinese - **License:** MIT ### Model Sources - **Repository:** [GitHub](https://github.com/s-nlp/Evergreen-classification) - **Paper:** [Will It Still Be True Tomorrow? Multilingual Evergreen Question Classification to Improve Trustworthy QA](https://arxiv.org/abs/2505.21115) ## How to Get Started with the Model ```python from transformers import AutoTokenizer, AutoModelForSequenceClassification import torch import time # Load model and tokenizer model_name = "s-nlp/E5-EG-small" tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForSequenceClassification.from_pretrained(model_name) # For optimal performance, use GPU if available device = torch.device("cuda" if torch.cuda.is_available() else "cpu") model = model.to(device) model.eval() # Batch classification example questions = [ "What is the capital of France?", "Who won the latest World Cup?", "What is the speed of light?", "What is the current Bitcoin price?" ] # Tokenize all questions inputs = tokenizer( questions, return_tensors="pt", padding=True, truncation=True, max_length=64 ).to(device) # Classify start_time = time.time() with torch.no_grad(): outputs = model(**inputs) predictions = torch.nn.functional.softmax(outputs.logits, dim=-1) predicted_classes = torch.argmax(predictions, dim=-1) inference_time = (time.time() - start_time) * 1000 # ms # Display results class_names = ["Immutable", "Mutable"] for i, question in enumerate(questions): print(f"Q: {question}") print(f" Classification: {class_names[predicted_classes[i].item()]}") print(f" Confidence: {predictions[i][predicted_classes[i]].item():.2f}") print(f"\nTotal inference time: {inference_time:.2f}ms") print(f"Average per question: {inference_time/len(questions):.2f}ms") ``` ## Training Details ### Training Data Same multilingual dataset as E5-EG-small: - ~4,000 questions per language - Balanced class distribution - Augmented with synthetic and translated data ### Training Procedure #### Preprocessing - Identical to E5-EG-small - Maximum sequence length: 64 tokens - Multilingual tokenization #### Training Hyperparameters - **Training regime:** fp16 mixed precision - **Epochs:** 10 - **Batch size:** 32 - **Learning rate:** 5e-05 - **Warmup steps:** 300 - **Weight decay:** 0.01 - **Optimizer:** AdamW - **Loss function:** Focal Loss (γ=2.0, α=0.25) with class weighting - **Gradient accumulation steps:** 1 #### Hardware - **GPUs:** Single NVIDIA V100 - **Training time:** ~8 hours ## Evaluation ### Testing Data Same test sets as E5-EG-large (2100 samples per language). ### Metrics #### Overall Performance | Metric | Score | |--------|-------| | Overall F1 | 0.89 | | Overall Accuracy | 0.88 | #### Per-Language F1 Scores | Language | F1 Score | |----------|----------| | English | 0.92 | | Chinese | 0.91 | | French | 0.90 | | German | 0.89 | | Russian | 0.88 | | Hebrew | 0.87 | | Arabic | 0.86 | #### Class-wise Performance | Class | Precision | Recall | F1 | |-------|-----------|--------|-----| | Immutable | 0.87 | 0.90 | 0.88 | | Mutable | 0.90 | 0.87 | 0.88 | ### Model Comparison | Model | Parameters | Overall F1 | Inference Time (ms) | |-------|------------|------------|---------------------| | E5-EG-large | 560M | 0.89 | 45 | | E5-EG-small | 118M | 0.85 | 12 | | mDeBERTa-base | 278M | 0.87 | 28 | | mBERT | 177M | 0.85 | 20 | ## Citation **BibTeX:** ```bibtex @misc{pletenev2025truetomorrowmultilingualevergreen, title={Will It Still Be True Tomorrow? Multilingual Evergreen Question Classification to Improve Trustworthy QA}, author={Sergey Pletenev and Maria Marina and Nikolay Ivanov and Daria Galimzianova and Nikita Krayko and Mikhail Salnikov and Vasily Konovalov and Alexander Panchenko and Viktor Moskvoretskii}, year={2025}, eprint={2505.21115}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2505.21115}, } ```