---
tags:
- mteb
- sentence-transformers
- transformers
- qwen
- feature-extraction
- text-classification
- text-clustering
- text-retrieval
- text-reranking
- text-pair-classification
- text-multilabel-classification
- text-bitext-mining
library_name: sentence-transformers
base_model: Qwen/Qwen3-Embedding-8B
license: apache-2.0
language:
- en
- multilingual
extra_gated_eu_disallowed: true
---
Euler-Legal-Embedding-V1
## Short Description
Euler-Legal-Embedding-V1 is a specialized embedding model for the legal domain, fine-tuned on [Qwen/Qwen3-Embedding-8B](https://huggingface.co/Qwen/Qwen3-Embedding-8B). It achieves strong performance on legal retrieval and reasoning tasks within the MTEB benchmark.
## Model Details
- **Base Model**: Qwen/Qwen3-Embedding-8B
- **Model Size**: ~8B
- **Embedding Dimension**: 4096 (Default for Qwen3-8B)
- **Max Input Tokens**: 1536
- **Pooling**: Last token pooling (Standard for Qwen-Embedding)
- **Training Data**: Legal domain specific dataset (`final-data-new-anonymized-grok4-filtered.jsonl`)
## Usage
### sentence-transformers support
Using this model becomes easy when you have [sentence-transformers](https://www.SBERT.net) installed:
```bash
pip install -U sentence-transformers
```
You can use the model like this:
```python
from sentence_transformers import SentenceTransformer
import torch
# Load the model
# trust_remote_code=True is required for Qwen-based models
model = SentenceTransformer(
"Mira190/Euler-Legal-Embedding-V1",
trust_remote_code=True,
model_kwargs={
"torch_dtype": torch.bfloat16,
"attn_implementation": "flash_attention_2", # Optional, requires flash-attn installed
},
)
model.max_seq_length = 1536
sentences = [
"The plaintiff filed a motion for summary judgment.",
"The court granted the motion based on lack of genuine dispute of material fact."
]
# No specific prompt is required for this version
embeddings = model.encode(
sentences,
normalize_embeddings=True,
batch_size=16,
show_progress_bar=True,
)
print(embeddings.shape)
# Output: (2, 4096)
```
### Transformers support
You can also use the model directly with the `transformers` library:
```python
import torch
from transformers import AutoModel, AutoTokenizer
model_id = "Mira190/Euler-Legal-Embedding-V1"
# Load tokenizer and model
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModel.from_pretrained(
model_id,
trust_remote_code=True,
torch_dtype=torch.bfloat16,
device_map="auto"
)
sentences = ["This is a legal document.", "This is another legal document."]
# Tokenize sentences
inputs = tokenizer(
sentences,
return_tensors="pt",
padding=True,
truncation=True,
max_length=1536
)
# Move inputs to the same device as the model
inputs = {k: v.to(model.device) for k, v in inputs.items()}
with torch.no_grad():
outputs = model(**inputs)
# Last token pooling (Standard for Qwen-Embedding)
# Note: Qwen embeddings typically use the last hidden state of the last token (EOS or specific token)
embeddings = outputs.last_hidden_state[:, -1]
# Normalize embeddings
embeddings = torch.nn.functional.normalize(embeddings, p=2, dim=1)
print(embeddings.shape)
# Output: (2, 4096)
```
## Training Details
The model was fine-tuned using LoRA (Low-Rank Adaptation) via the Swift framework.
- **Framework**: Swift
- **Loss Function**: InfoNCE (Temperature: 0.03)
- **Batch Size**: 4 (per device)
- **Learning Rate**: 2e-5
- **LoRA Config**: Rank 8, Alpha 32, Dropout 0.05
## Citation
If you find this model useful, please consider citing:
```bibtex
@misc{euler2025legal,
title={Euler-Legal-Embedding: Advanced Legal Representation Learning},
author={LawRank Team},
year={2025},
publisher={Hugging Face}
}
```