--- tags: - mteb - sentence-transformers - transformers - qwen - feature-extraction - text-classification - text-clustering - text-retrieval - text-reranking - text-pair-classification - text-multilabel-classification - text-bitext-mining library_name: sentence-transformers base_model: Qwen/Qwen3-Embedding-8B license: apache-2.0 language: - en - multilingual extra_gated_eu_disallowed: true ---

Euler-Legal-Embedding-V1

HuggingFace

## Short Description Euler-Legal-Embedding-V1 is a specialized embedding model for the legal domain, fine-tuned on [Qwen/Qwen3-Embedding-8B](https://huggingface.co/Qwen/Qwen3-Embedding-8B). It achieves strong performance on legal retrieval and reasoning tasks within the MTEB benchmark. ## Model Details - **Base Model**: Qwen/Qwen3-Embedding-8B - **Model Size**: ~8B - **Embedding Dimension**: 4096 (Default for Qwen3-8B) - **Max Input Tokens**: 1536 - **Pooling**: Last token pooling (Standard for Qwen-Embedding) - **Training Data**: Legal domain specific dataset (`final-data-new-anonymized-grok4-filtered.jsonl`) ## Usage ### sentence-transformers support Using this model becomes easy when you have [sentence-transformers](https://www.SBERT.net) installed: ```bash pip install -U sentence-transformers ``` You can use the model like this: ```python from sentence_transformers import SentenceTransformer import torch # Load the model # trust_remote_code=True is required for Qwen-based models model = SentenceTransformer( "Mira190/Euler-Legal-Embedding-V1", trust_remote_code=True, model_kwargs={ "torch_dtype": torch.bfloat16, "attn_implementation": "flash_attention_2", # Optional, requires flash-attn installed }, ) model.max_seq_length = 1536 sentences = [ "The plaintiff filed a motion for summary judgment.", "The court granted the motion based on lack of genuine dispute of material fact." ] # No specific prompt is required for this version embeddings = model.encode( sentences, normalize_embeddings=True, batch_size=16, show_progress_bar=True, ) print(embeddings.shape) # Output: (2, 4096) ``` ### Transformers support You can also use the model directly with the `transformers` library: ```python import torch from transformers import AutoModel, AutoTokenizer model_id = "Mira190/Euler-Legal-Embedding-V1" # Load tokenizer and model tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True) model = AutoModel.from_pretrained( model_id, trust_remote_code=True, torch_dtype=torch.bfloat16, device_map="auto" ) sentences = ["This is a legal document.", "This is another legal document."] # Tokenize sentences inputs = tokenizer( sentences, return_tensors="pt", padding=True, truncation=True, max_length=1536 ) # Move inputs to the same device as the model inputs = {k: v.to(model.device) for k, v in inputs.items()} with torch.no_grad(): outputs = model(**inputs) # Last token pooling (Standard for Qwen-Embedding) # Note: Qwen embeddings typically use the last hidden state of the last token (EOS or specific token) embeddings = outputs.last_hidden_state[:, -1] # Normalize embeddings embeddings = torch.nn.functional.normalize(embeddings, p=2, dim=1) print(embeddings.shape) # Output: (2, 4096) ``` ## Training Details The model was fine-tuned using LoRA (Low-Rank Adaptation) via the Swift framework. - **Framework**: Swift - **Loss Function**: InfoNCE (Temperature: 0.03) - **Batch Size**: 4 (per device) - **Learning Rate**: 2e-5 - **LoRA Config**: Rank 8, Alpha 32, Dropout 0.05 ## Citation If you find this model useful, please consider citing: ```bibtex @misc{euler2025legal, title={Euler-Legal-Embedding: Advanced Legal Representation Learning}, author={LawRank Team}, year={2025}, publisher={Hugging Face} } ```