SingVisio: Visual Analytics of Diffusion Model for Singing Voice Conversion Paper • 2402.12660 • Published Feb 20, 2024
DualCodec: A Low-Frame-Rate, Semantically-Enhanced Neural Audio Codec for Speech Generation Paper • 2505.13000 • Published May 19, 2025 • 1
Vevo2: Bridging Controllable Speech and Singing Voice Generation via Unified Prosody Learning Paper • 2508.16332 • Published Aug 22, 2025
Advancing Zero-shot Text-to-Speech Intelligibility across Diverse Domains via Preference Alignment Paper • 2505.04113 • Published May 7, 2025
SpeechJudge: Towards Human-Level Judgment for Speech Naturalness Paper • 2511.07931 • Published Nov 11, 2025
Emilia: An Extensive, Multilingual, and Diverse Speech Dataset for Large-Scale Speech Generation Paper • 2407.05361 • Published Jul 7, 2024 • 2
Emilia: A Large-Scale, Extensive, Multilingual, and Diverse Dataset for Speech Generation Paper • 2501.15907 • Published Jan 27, 2025 • 17
Closing the Modality Reasoning Gap for Speech Large Language Models Paper • 2601.05543 • Published 5 days ago