MLAIRE: Multilingual Language-Aware Information Retrieval Evaluation Protocal
Abstract
Multilingual information retrieval evaluation protocol MLAIRE separates semantic retrieval accuracy from query-language preference to better assess retrieval utility across mixed-language corpora.
Multilingual Information Retrieval is increasingly important in real-world search settings, where users issue queries over mixed-language corpora. Existing evaluations mainly reward language-agnostic semantic relevance, treating relevant passages equally regardless of language. Yet retrieval utility also depends on the language of the retrieved passages: users may prefer results they can read and verify in the query language, and query--passage language mismatch can complicate downstream grounding and answer verification in Retrieval-Augmented Generation systems. To evaluate this language-aware dimension, we introduce MLAIRE, a Multilingual Language-Aware Information Retrieval Evaluation protocol that disentangles cross-lingual semantic retrieval from query-language preference. MLAIRE constructs controlled pools with parallel passages across languages, enabling measurement of semantic retrieval accuracy and query-language preference when equivalent translations are available. We propose language-aware metrics, including Language Preference Rate (LPR) and Lang-nDCG, together with a 4-way decomposition separating semantic and query-language preference failures. Evaluating 31 dense, sparse, and late-interaction retrievers, we show that standard metrics obscure distinct behaviors: semantically strong retrievers may return correct content in a non-query language, while retrievers with stronger query-language preference may retrieve less semantically relevant passages.
Community
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- All Languages Matter: Understanding and Mitigating Language Bias in Multilingual RAG (2026)
- CLEAR: Cross-Lingual Enhancement in Alignment via Reverse-training (2026)
- CORAL: Adaptive Retrieval Loop for Culturally-Aligned Multilingual RAG (2026)
- Code-Switching Information Retrieval: Benchmarks, Analysis, and the Limits of Current Retrievers (2026)
- From BM25 to Corrective RAG: Benchmarking Retrieval Strategies for Text-and-Table Documents (2026)
- CroSearch-R1: Better Leveraging Cross-lingual Knowledge for Retrieval-Augmented Generation (2026)
- Improving Semantic Proximity in Information Retrieval through Cross-Lingual Alignment (2026)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend
Get this paper in your agent:
hf papers read 2605.07249 Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper