Instructions to use Alibaba-NLP/gte-multilingual-reranker-base with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- sentence-transformers
How to use Alibaba-NLP/gte-multilingual-reranker-base with sentence-transformers:
from sentence_transformers import CrossEncoder model = CrossEncoder("Alibaba-NLP/gte-multilingual-reranker-base", trust_remote_code=True) query = "Which planet is known as the Red Planet?" passages = [ "Venus is often called Earth's twin because of its similar size and proximity.", "Mars, known for its reddish appearance, is often referred to as the Red Planet.", "Jupiter, the largest planet in our solar system, has a prominent red spot.", "Saturn, famous for its rings, is sometimes mistaken for the Red Planet." ] scores = model.predict([(query, passage) for passage in passages]) print(scores) - Transformers
How to use Alibaba-NLP/gte-multilingual-reranker-base with Transformers:
# Load model directly from transformers import AutoModelForSequenceClassification model = AutoModelForSequenceClassification.from_pretrained("Alibaba-NLP/gte-multilingual-reranker-base", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
Is this model finetuned with MsMarco or mMarco
Hello, thank you for releasing this multilingual reranker model with Apache 2.0 license. I'd like to ask if you used any non-commercially available datasets (e.g. MsMarco, mMARCO) for finetuning /training this model? In the paper ( https://arxiv.org/pdf/2407.19669), section B.2 states that MS MARCO and mMARCO-zh were used. These datasets are for research purpose only.
Could you please clarify?
Thanks.
Thank you for your inquiry regarding the multilingual reranker model. We appreciate your interest in our work.
To clarify, the model does leverage the MS MARCO and mMARCO-zh datasets, which are indeed intended for research purposes only. We acknowledge the restrictions associated with these datasets and ensure that all usage complies with the terms provided by the dataset creators.
The findings presented in the paper reflect our commitment to using high-quality, publicly available data while adhering to the specified licensing agreements.
Best regards!
thanks for your quick response!