Some languages are not supported by the models

#3
by TrueTuner - opened

First of all, the benchmark is excellent work and very useful for the community.

But I have a question:

You calculated the WER scores for:

  • granite-speech
  • nvidia-nemo (parakeet)
  • kyutai

However, it is clearly stated that these models do not support African languages.
Have you fine-tuned each of these models on your datasets?
Or do these models actually perform well in African languages natively?

Thank you for clarifying this for us.

Microsoft org
edited Apr 30

Thanks @TrueTuner . Yes, some of the models do not support African languages. We did not fine-tune these models except the Paza models. All 3 metrics reported are from the base models. However, one intention of this benchmark was to highlight the zero-shot accuracy vs efficiency trade-offs across SOTA ASR models, as a useful output for finetuning considerations. For example, Parakeet achieves the best RTFx on most languages, which is a strong efficiency baseline over all other models and it becomes a practical candidate for fine-tuning on unsupported African languages rather than a claim of native language support.

muchai-mercy changed discussion status to closed

Sign up or log in to comment