---
library_name: transformers
datasets:
- s-nlp/EverGreen-Multilingual
language:
- ru
- en
- fr
- de
- he
- ar
- zh
base_model:
- intfloat/multilingual-e5-large-instruct
pipeline_tag: text-classification
---
# E5-EG-large

A lightweight multilingual model for temporal classification of questions, fine-tuned from [intfloat/multilingual-e5-large-instruct](https://huggingface.co/intfloat/multilingual-e5-large-instruct).

## Model Details

### Model Description

E5-EG-small (E5 EverGreen - Large) is an efficient multilingual text classification model that determines whether questions have temporally mutable or immutable answers. This model offers a balanced trade-off between performance and computational efficiency.

- **Model type:** Text Classification
- **Base model:** [intfloat/multilingual-e5-small](https://huggingface.co/intfloat/multilingual-e5-large-instruct)
- **Language(s):** Russian, English, French, German, Hebrew, Arabic, Chinese
- **License:** MIT

### Model Sources

- **Repository:** [GitHub](https://github.com/s-nlp/Evergreen-classification)
- **Paper:** [Will It Still Be True Tomorrow? Multilingual Evergreen Question Classification to Improve Trustworthy QA](https://arxiv.org/abs/2505.21115)


## How to Get Started with the Model

```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
import time

# Load model and tokenizer
model_name = "s-nlp/E5-EG-small"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

# For optimal performance, use GPU if available
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = model.to(device)
model.eval()

# Batch classification example
questions = [
    "What is the capital of France?",
    "Who won the latest World Cup?",
    "What is the speed of light?",
    "What is the current Bitcoin price?"
]

# Tokenize all questions
inputs = tokenizer(
    questions, 
    return_tensors="pt", 
    padding=True, 
    truncation=True, 
    max_length=64
).to(device)

# Classify
start_time = time.time()
with torch.no_grad():
    outputs = model(**inputs)
    predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)
    predicted_classes = torch.argmax(predictions, dim=-1)

inference_time = (time.time() - start_time) * 1000  # ms

# Display results
class_names = ["Immutable", "Mutable"]
for i, question in enumerate(questions):
    print(f"Q: {question}")
    print(f"   Classification: {class_names[predicted_classes[i].item()]}")
    print(f"   Confidence: {predictions[i][predicted_classes[i]].item():.2f}")

print(f"\nTotal inference time: {inference_time:.2f}ms")
print(f"Average per question: {inference_time/len(questions):.2f}ms")
```

## Training Details

### Training Data

Same multilingual dataset as E5-EG-small:
- ~4,000 questions per language
- Balanced class distribution
- Augmented with synthetic and translated data

### Training Procedure

#### Preprocessing
- Identical to E5-EG-small
- Maximum sequence length: 64 tokens
- Multilingual tokenization

#### Training Hyperparameters
- **Training regime:** fp16 mixed precision
- **Epochs:** 10 
- **Batch size:** 32 
- **Learning rate:** 5e-05 
- **Warmup steps:** 300
- **Weight decay:** 0.01
- **Optimizer:** AdamW
- **Loss function:** Focal Loss (γ=2.0, α=0.25) with class weighting
- **Gradient accumulation steps:** 1

#### Hardware
- **GPUs:** Single NVIDIA V100
- **Training time:** ~8 hours

## Evaluation

### Testing Data

Same test sets as E5-EG-large (2100 samples per language).


### Metrics

#### Overall Performance
| Metric | Score |
|--------|-------|
| Overall F1 | 0.89 |
| Overall Accuracy | 0.88 |

#### Per-Language F1 Scores
| Language | F1 Score |
|----------|----------|
| English | 0.92 |
| Chinese | 0.91 |
| French | 0.90 |
| German | 0.89 |
| Russian | 0.88 |
| Hebrew | 0.87 |
| Arabic | 0.86 |

#### Class-wise Performance
| Class | Precision | Recall | F1 |
|-------|-----------|--------|-----|
| Immutable | 0.87 | 0.90 | 0.88 |
| Mutable | 0.90 | 0.87 | 0.88 |

### Model Comparison

| Model | Parameters | Overall F1 | Inference Time (ms) |
|-------|------------|------------|---------------------|
| E5-EG-large | 560M | 0.89 | 45 |
| E5-EG-small | 118M | 0.85 | 12 |
| mDeBERTa-base | 278M | 0.87 | 28 |
| mBERT | 177M | 0.85 | 20 |

## Citation

**BibTeX:**

```bibtex
@misc{pletenev2025truetomorrowmultilingualevergreen,
      title={Will It Still Be True Tomorrow? Multilingual Evergreen Question Classification to Improve Trustworthy QA}, 
      author={Sergey Pletenev and Maria Marina and Nikolay Ivanov and Daria Galimzianova and Nikita Krayko and Mikhail Salnikov and Vasily Konovalov and Alexander Panchenko and Viktor Moskvoretskii},
      year={2025},
      eprint={2505.21115},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2505.21115}, 
}
```