Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
yuriyvnvΒ 
posted an update May 6
Post
3670
πŸ“„ The WAVe paper is officially out in the Information Sciences Journal.

You saw the PT and NL model releases earlier this year. This is the peer-reviewed paper behind them, with the full method, ablations, and downstream ASR evaluation.

Quick recap: WAVe is a 1B multimodal embedding model that filters synthetic speech at the word level, not the sentence level. On Portuguese ASR it cuts training steps by 34%, improves cross-domain generalization by 50%, and matches WER with 30% less synthetic data.

πŸ“¦ Resources
- Paper: https://www.sciencedirect.com/science/article/pii/S0020025526005220
- PT model: yuriyvnv/WAVe-1B-Multimodal-PT
- NL model: yuriyvnv/WAVe-1B-Multimodal-NL
- Collection: https://huggingface.co/collections/yuriyvnv/multi-modal-embeddings-for-synthetic-transcript-filtering
- Code: https://github.com/yuriyvnv/WAVe

If you train ASR on synthetic or back-translated data, would like to see WAVe benchmarked on other languages.

@reach-vb @ylacombe @hf-audio @BramVanroy

#speech #asr #multimodal #syntheticdata #lowresource
In this post