Model Description

GemmaX2-28-2B-v0.2 is an LLM-based translation model. It has been fintuned on GemmaX2-28-2B-Pretrain, which is a language model developed through continual pretraining of Gemma2-2B using a mix of 56 billion tokens from both monolingual and parallel data across 28 different languages. Please find more details in our paper: Multilingual Machine Translation with Open Large Language Models at Practical Scale: An Empirical Study.

  • Supported Languages: Arabic, Bengali, Czech, German, English, Spanish, Persian, French, Hebrew, Hindi, Indonesian, Italian, Japanese, Khmer, Korean, Lao, Malay, Burmese, Dutch, Polish, Portuguese, Russian, Thai, Tagalog, Turkish, Urdu, Vietnamese, Chinese.
  • GitHub: Please find more details in our GitHub repository.
  • Developed by: Xiaomi Inc.

Model Performance

Update: GemmaX2-28-2B-v0.2 adopts the translation instructions used for finetuning the MiLMMT-46 models, in contrast to GemmaX2-28-2B-v0.1.

Experimental Result

Translation Prompt

Translate this from <source language name> to <target language name>:
<source language name>: <source language sentence>
<target language name>:

Please use the language name specified above in the translation prompt.

Run the model

Using on vLLM:

from vllm import LLM, SamplingParams


model_id = "xiaomi-research/GemmaX2-28-2B-v0.2"

model = LLM(model=model_id)
sampling_params = SamplingParams(top_k=1, temperature=0, max_tokens=2048)

text = "Translate this from Chinese to English:\nChinese: 我爱机器翻译\nEnglish:"

outputs = model.generate(text, sampling_params)
print(outputs[0].outputs[0].text)

Using on Transformers:

from transformers import AutoModelForCausalLM, AutoTokenizer


model_id = "xiaomi-research/GemmaX2-28-2B-v0.2"

model = AutoModelForCausalLM.from_pretrained(model_id)
tokenizer = AutoTokenizer.from_pretrained(model_id)

text = "Translate this from Chinese to English:\nChinese: 我爱机器翻译\nEnglish:"
inputs = tokenizer(text, add_special_tokens=False, return_tensors="pt")

outputs = model.generate(**inputs, max_new_tokens=1024)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Citation

@inproceedings{cui-etal-2025-multilingual,
    title = "Multilingual Machine Translation with Open Large Language Models at Practical Scale: An Empirical Study",
    author = "Cui, Menglong  and
      Gao, Pengzhi  and
      Liu, Wei  and
      Luan, Jian  and
      Wang, Bin",
    editor = "Chiruzzo, Luis  and
      Ritter, Alan  and
      Wang, Lu",
    booktitle = "Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)",
    month = apr,
    year = "2025",
    address = "Albuquerque, New Mexico",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2025.naacl-long.280/",
    doi = "10.18653/v1/2025.naacl-long.280",
    pages = "5420--5443",
    ISBN = "979-8-89176-189-6",
    abstract = "Large language models (LLMs) have shown continuously improving multilingual capabilities, and even small-scale open-source models have demonstrated rapid performance enhancement. In this paper, we systematically explore the abilities of open LLMs with less than ten billion parameters to handle multilingual machine translation (MT) tasks. We conduct comprehensive evaluations on six popular LLMs and find that models like Gemma2-9B exhibit impressive multilingual translation capabilities. We then introduce the Parallel-First Monolingual-Second (PFMS) data mixing strategy in the continual pretraining stage to further enhance the MT performance and present GemmaX2-28, a 9B model achieving top-tier multilingual translation performance across 28 languages. Specifically, GemmaX2-28 consistently outperforms the state-of-the-art (SOTA) models such as TowerInstruct and X-ALMA and achieves competitive performance with Google Translate and GPT-4-turbo."
}

Limitations

GemmaX2-28 currently supports only the 28 languages listed above, and strong translation performance is not guaranteed for other languages. We will continue to improve the translation quality of GemmaX2-28, and future model releases will follow in due course.

Downloads last month
229
Safetensors
Model size
3B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for xiaomi-research/GemmaX2-28-2B-v0.2

Base model

google/gemma-2-2b
Finetuned
(3)
this model
Quantizations
2 models

Collection including xiaomi-research/GemmaX2-28-2B-v0.2

Paper for xiaomi-research/GemmaX2-28-2B-v0.2