Hugging Face
Models
Datasets
Spaces
Community
Docs
Enterprise
Pricing
Log In
Sign Up
diwank
's Collections
M
Text-diffusion
steadytext
world
Med
code
Robotics
reasoning
F
search
Vision
Art
K
S1.1
Sam
Audio
thought
Audio
updated
1 day ago
Upvote
-
espnet/yodas2
Updated
May 15
•
43.6k
•
43
Flux9665/BibleMMS
Viewer
•
Updated
Jun 16, 2024
•
736k
•
1.76k
•
70
google/MusicCaps
Viewer
•
Updated
Mar 8, 2023
•
5.52k
•
740
•
143
ShoukanLabs/AniSpeech
Viewer
•
Updated
Jan 29, 2024
•
23.7k
•
1.09k
•
58
aoxo/text2asmr-uncensored
Preview
•
Updated
Feb 19, 2024
•
63
•
17
google/fleurs
Updated
Aug 25, 2024
•
34.7k
•
356
phongdtd/youtube_casual_audio
Updated
Sep 10, 2024
•
246
•
4
ProgramComputer/voxceleb
Updated
Jul 27, 2024
•
4.13k
•
97
jhu-clsp/seamless-align
Preview
•
Updated
Jun 2, 2024
•
336
•
13
IVLLab/MultiDialog
Updated
Aug 29, 2024
•
1.55k
•
27
PetraAI/PetraAI
Updated
Sep 14, 2023
•
670
•
21
ReDUB/SoundHarvest
Viewer
•
Updated
Dec 14, 2023
•
2
•
97
•
2
jhu-clsp/seamless-align-expressive
Updated
Feb 22, 2024
•
65
•
5
jg583/NSynth
Updated
Apr 26, 2024
•
461
•
20
voice-is-cool/voxtube
Viewer
•
Updated
Feb 13, 2024
•
4.46M
•
1.22k
•
18
google/speech_commands
Updated
Jan 18, 2024
•
2.23k
•
54
Fhrozen/FSD50k
Preview
•
Updated
May 17
•
5.08k
•
9
nvidia/parakeet-tdt-1.1b
Automatic Speech Recognition
•
Updated
5 days ago
•
2.4k
•
110
yl4579/StyleTTS2-LibriTTS
Updated
Nov 21, 2023
•
54
coqui/XTTS-v2
Text-to-Speech
•
Updated
Dec 11, 2023
•
6.33M
•
3.22k
facebook/wav2vec2-large-robust
Updated
Nov 5, 2021
•
2.23k
•
37
laion/links_to_pocasts_lecture_and_shows_for_tts
Viewer
•
Updated
May 29, 2024
•
331k
•
21
•
9
laion/youtube-urls-for-emotional-tts
Viewer
•
Updated
May 21, 2024
•
78.3k
•
39
•
3
laion/chirp-v2-dataset
Viewer
•
Updated
Mar 25, 2024
•
64
•
36
•
6
speechcolab/gigaspeech
Viewer
•
Updated
Nov 23, 2023
•
364k
•
18.4k
•
140
fixie-ai/boolq-audio
Viewer
•
Updated
Jun 12, 2024
•
12.7k
•
587
•
7
fixie-ai/soda-audio
Viewer
•
Updated
Jul 24, 2024
•
102k
•
148
•
4
amphion/Emilia
Preview
•
Updated
Sep 3
•
217
•
86
google/cvss
Updated
Feb 10, 2024
•
139
•
15
PolyAI/minds14
Viewer
•
Updated
Aug 12
•
16.3k
•
8.08k
•
93
Qwen/Qwen2-Audio-7B-Instruct
Audio-Text-to-Text
•
8B
•
Updated
Jan 12
•
206k
•
499
infgrad/dialogue_rewrite_llm
Viewer
•
Updated
Feb 17, 2024
•
1.64M
•
41
•
16
FBK-MT/Speech-MASSIVE
Viewer
•
Updated
Oct 7
•
97.6k
•
2.39k
•
46
Qwen/Qwen2-Audio-7B
Audio-Text-to-Text
•
8B
•
Updated
Nov 20, 2024
•
46.9k
•
155
Mozilla/whisperfile
Updated
Oct 2, 2024
•
1.37k
•
255
vucinatim/spectrogram-captions
Viewer
•
Updated
Jan 3, 2023
•
1k
•
86
•
4
rachit8562/mel_spectogram_bird_audio
Viewer
•
Updated
Jan 7, 2023
•
72.2k
•
71
•
2
novateur/WavTokenizer
Text-to-Speech
•
Updated
Dec 2, 2024
•
54
gpt-omni/mini-omni
Text-to-Speech
•
Updated
Sep 4, 2024
•
1
•
435
amphion/Emilia-Dataset
Viewer
•
Updated
Feb 28
•
54.8M
•
89.7k
•
404
FLUX that Plays Music
Paper
•
2409.00587
•
Published
Sep 1, 2024
•
33
feizhengcong/FluxMusic
Updated
Nov 22, 2024
•
67
fishaudio/fish-speech-1.4
Text-to-Speech
•
Updated
Nov 5, 2024
•
246
•
453
ICTNLP/Llama-3.1-8B-Omni
9B
•
Updated
Nov 14, 2024
•
107
•
415
HuggingFaceFV/finevideo
Viewer
•
Updated
Dec 16, 2024
•
39.5k
•
11.5k
•
331
kyutai/moshiko-pytorch-bf16
Updated
Sep 18, 2024
•
330k
•
190
kyutai/moshika-pytorch-bf16
Updated
Sep 18, 2024
•
564
•
58
Revai/reverb-asr
Automatic Speech Recognition
•
Updated
Dec 9, 2024
•
11
•
92
FBK-MT/mosel
Viewer
•
Updated
Oct 7
•
2.2M
•
3.52k
•
85
Menlo/llama3-s-instruct-v0.2
8B
•
Updated
Aug 23, 2024
•
18
•
45
SWivid/F5-TTS
Text-to-Speech
•
Updated
Mar 21
•
793k
•
1.13k
mit-han-lab/hart-0.7b-1024px
Unconditional Image Generation
•
Updated
Nov 17, 2024
•
13
zai-org/glm-4-voice-9b
10B
•
Updated
Oct 25, 2024
•
1.15k
•
109
amphion/MaskGCT
Text-to-Speech
•
Updated
Apr 13
•
573
•
301
nvidia/parakeet-tdt_ctc-110m
Automatic Speech Recognition
•
Updated
Feb 18
•
79.7k
•
36
nvidia/audio-flamingo
Updated
Oct 2, 2024
•
27
fishaudio/fish-agent-v0.1-3b
Audio-to-Audio
•
Updated
Nov 1, 2024
•
23
•
266
OuteAI/OuteTTS-0.1-350M
Text-to-Speech
•
0.4B
•
Updated
Apr 17
•
1.2k
•
302
adamo1139/Meta_Spirit-LM-ungated
Text-to-Audio
•
Updated
Oct 20, 2024
•
18
si-pbc/hertz-dev
Audio-to-Audio
•
Updated
Nov 14, 2024
•
215
pyannote/speech-separation-ami-1.0
Updated
Nov 11, 2024
•
3.82k
•
69
nyuuzyou/suno
Preview
•
Updated
Nov 20, 2024
•
96
•
73
gpt-omni/mini-omni2
Any-to-Any
•
Updated
Oct 24, 2024
•
131
•
279
fixie-ai/ultravox-v0_4_1-llama-3_1-70b
Audio-Text-to-Text
•
58.7M
•
Updated
May 6
•
34
•
24
aiola/whisper-ner-tag-and-mask-v1
Automatic Speech Recognition
•
2B
•
Updated
Nov 21, 2024
•
17
•
6
nyrahealth/CrisperWhisper
Automatic Speech Recognition
•
2B
•
Updated
Dec 19, 2024
•
92.7k
•
319
laion/laions_got_talent
Viewer
•
Updated
Jan 5
•
461k
•
13.6k
•
39
nvidia/se_den_sb_16k_small
Updated
Nov 28, 2024
•
2
nvidia/se_der_sb_16k_small
Updated
Nov 28, 2024
•
3
nvidia/sr_ssl_flowmatching_16k_430m
Updated
Nov 28, 2024
•
8
nvidia/low-frame-rate-speech-codec-22khz
Feature Extraction
•
Updated
Aug 5
•
153
•
19
laion/laion-audio-preview
Viewer
•
Updated
Dec 4, 2024
•
4.15M
•
3.05k
•
11
NexaAI/OmniAudio-2.6B
Audio-Text-to-Text
•
3B
•
Updated
Dec 13, 2024
•
1.2k
•
281
laion/LAION-Audio-300M
Viewer
•
Updated
Jan 10
•
229M
•
37.8k
•
47
hexgrad/Kokoro-82M
Text-to-Speech
•
Updated
Apr 10
•
4.13M
•
•
5.37k
ByteDance/Make-An-Audio-2
Updated
May 22, 2024
•
14
tincans-ai/pause-asr-alpha
Automatic Speech Recognition
•
94.4M
•
Updated
Sep 17, 2024
•
13
•
6
nvidia/bigvgan_v2_44khz_128band_512x
Audio-to-Audio
•
Updated
Sep 5, 2024
•
491k
•
60
speechbrain/sepformer-wham
Audio-to-Audio
•
Updated
Feb 19, 2024
•
410
•
44
blaise-tk/TITAN
Audio-to-Audio
•
Updated
Aug 19, 2024
•
25
•
65
ResembleAI/resemble-enhance
Audio-to-Audio
•
Updated
Dec 21, 2023
•
170
declare-lab/TangoFlux
Text-to-Audio
•
Updated
May 7
•
516
•
100
declare-lab/tango-full
Text-to-Audio
•
Updated
Jun 10, 2024
•
21
•
12
declare-lab/mustango
Text-to-Audio
•
Updated
Dec 17, 2023
•
10.7k
•
41
declare-lab/tango2
Text-to-Audio
•
Updated
Apr 16, 2024
•
36
•
18
declare-lab/tango2-full
Text-to-Audio
•
Updated
Dec 29, 2024
•
23
•
11
HKUSTAudio/Llasa-3B
Text-to-Speech
•
4B
•
Updated
May 10
•
1.71k
•
522
fixie-ai/ultravox-v0_4_1-llama-3_3-70b
Audio-Text-to-Text
•
58.7M
•
Updated
May 6
•
20
•
11
UsefulSensors/moonshine-base
Automatic Speech Recognition
•
61.5M
•
Updated
Jan 30
•
7.1k
•
35
UsefulSensors/moonshine
Automatic Speech Recognition
•
Updated
9 days ago
•
82
laion/laions_got_talent_raw
Viewer
•
Updated
Jan 13
•
59k
•
423
•
6
HKUSTAudio/Llasa-8B
Text-to-Speech
•
9B
•
Updated
Mar 9
•
973
•
96
baichuan-inc/Baichuan-Omni-1d5
11B
•
Updated
Feb 8
•
80
•
46
m-a-p/YuE-s1-7B-anneal-en-icl
Text Generation
•
6B
•
Updated
Mar 12
•
1.1k
•
52
m-a-p/YuE-s1-7B-anneal-en-cot
Text Generation
•
6B
•
Updated
Mar 12
•
19k
•
434
unlimitedbytes/hailuo-ai-voices
Viewer
•
Updated
Jan 19
•
68k
•
337
•
6
m-a-p/YuE-s2-1B-general
Text Generation
•
2B
•
Updated
Mar 12
•
6.49k
•
57
Zyphra/Zonos-v0.1-speaker-embedding
Updated
Feb 12
•
28
Zyphra/Zonos-v0.1-hybrid
Text-to-Speech
•
Updated
Jun 3
•
42k
•
1.1k
FunAudioLLM/InspireMusic-1.5B-24kHz
2B
•
Updated
Mar 28
•
16
•
6
jadechoghari/VoiceRestore
Audio-to-Audio
•
Updated
Oct 2, 2024
•
52
•
44
stepfun-ai/Step-Audio-Tokenizer
Updated
Feb 18
•
48
stepfun-ai/Step-Audio-TTS-3B
Text-to-Speech
•
4B
•
Updated
Feb 17
•
192
•
192
stepfun-ai/Step-Audio-Chat
Audio-Text-to-Text
•
132B
•
Updated
Feb 17
•
145
•
458
Felguk/Felguk-omni-v0
Audio-Text-to-Text
•
Updated
Jan 19
•
15
•
2
livekit/turn-detector
Text Generation
•
0.1B
•
Updated
Dec 12, 2024
•
71.3k
•
83
facebook/jasco-chords-drums-melody-1B
Updated
Mar 13
•
11
HKUSTAudio/Spark-TTS-0.5B
Text-to-Speech
•
Updated
Mar 7
•
5
•
6
ASLP-lab/DiffRhythm-base
Updated
Mar 26
•
45
•
171
SparkAudio/Spark-TTS-0.5B
Text-to-Speech
•
Updated
Mar 7
•
984
•
708
nvidia/audio-flamingo-2-0.5B
Audio-Text-to-Text
•
Updated
Jun 25
•
13
sesame/csm-1b
Text-to-Speech
•
Updated
7 days ago
•
25.9k
•
2.28k
kyutai/mimi
Feature Extraction
•
96.2M
•
Updated
Jul 2
•
410k
•
•
270
Roblox/voice-safety-classifier
Audio Classification
•
94.6M
•
Updated
Jul 8, 2024
•
193
•
40
canopylabs/orpheus-3b-0.1-pretrained
Text-to-Speech
•
4B
•
Updated
Mar 19
•
5.72k
•
•
159
ibm-granite/granite-speech-3.2-8b
Automatic Speech Recognition
•
8B
•
Updated
Apr 16
•
123
•
84
ByteDance/MegaTTS3
Text-to-Speech
•
Updated
Apr 4
•
181
•
412
amphion/Vevo
Text-to-Speech
•
Updated
Apr 13
•
27
•
42
amphion/Vevo1.5
Updated
Apr 13
•
52
•
22
kyutai/DailyTalkContiguous
Preview
•
Updated
Mar 24
•
17.5k
•
18
nvidia/parakeet-tdt-0.6b-v2
Automatic Speech Recognition
•
Updated
11 days ago
•
670k
•
1.38k
ibm-granite/granite-speech-3.3-8b
Automatic Speech Recognition
•
9B
•
Updated
Aug 19
•
67.7k
•
146
ICTNLP/SLED-TTS-Streaming-Libriheavy
Text-to-Speech
•
0.2B
•
Updated
Jun 17
•
20
•
7
ACE-Step/ACE-Step-v1-3.5B
Text-to-Audio
•
Updated
May 22
•
641
VITA-MLLM/VITA-Audio-Plus-Vanilla
8B
•
Updated
May 6
•
209
•
5
ICTNLP/InstructS2S-200K
Viewer
•
Updated
8 days ago
•
200k
•
1.29k
•
8
ICTNLP/LLaMA-Omni2-14B
16B
•
Updated
May 19
•
9
•
1
laion/empathic-insights-voice
Updated
May 18
•
495
•
1
disco-eth/EuroSpeech
Viewer
•
Updated
Sep 29
•
8.42M
•
22k
•
91
TEN-framework/ten-vad
Updated
Jul 9
•
33
•
119
TEN-framework/TEN_Turn_Detection
Text Generation
•
8B
•
Updated
May 27
•
9.89k
•
57
open-r1/Mixture-of-Thoughts
Viewer
•
Updated
May 26
•
699k
•
5k
•
290
fishaudio/openaudio-s1-mini
Text-to-Speech
•
Updated
Jun 2
•
3.64k
•
526
tencent/SongGeneration
Text-to-Audio
•
Updated
Oct 23
•
598
•
269
kyutai/stt-2.6b-en-trfs
Automatic Speech Recognition
•
3B
•
Updated
Jun 26
•
3.01k
•
10
nvidia/canary-qwen-2.5b
Automatic Speech Recognition
•
3B
•
Updated
5 days ago
•
10.6k
•
314
mistralai/Voxtral-Mini-3B-2507
5B
•
Updated
Jul 28
•
527k
•
596
bosonai/higgs-audio-v2-generation-3B-base
Text-to-Speech
•
6B
•
Updated
Jul 28
•
158k
•
644
stepfun-ai/Step-Audio-AQAA
137B
•
Updated
Jun 12
•
62
•
46
nvidia/audio-flamingo-3-chat
Updated
Jul 14
•
119
•
43
nvidia/audio-flamingo-3
Audio-Text-to-Text
•
Updated
10 days ago
•
987
•
135
KittenML/kitten-tts-nano-0.1
Updated
Aug 30
•
26.7k
•
490
FunAudioLLM/CosyVoice-300M-Instruct
Updated
Aug 11
•
2
nvidia/canary-1b-v2
Automatic Speech Recognition
•
Updated
5 days ago
•
106k
•
312
Vyvo/VyvoTTS-LFM2-Multi-Speaker
0.4B
•
Updated
Aug 15
•
10
•
6
microsoft/VibeVoice-1.5B
Text-to-Speech
•
3B
•
Updated
Sep 1
•
341k
•
2.05k
allenai/OLMoASR
Audio-Text-to-Text
•
Updated
Aug 28
•
69
tencent/HunyuanVideo-Foley
Text-to-Audio
•
Updated
Sep 29
•
413
•
147
stepfun-ai/Step-Audio-2-mini
Any-to-Any
•
8B
•
Updated
Sep 5
•
1.34k
•
239
UsefulSensors/moonshine-tiny
Automatic Speech Recognition
•
27.1M
•
Updated
Jan 30
•
10.2k
•
21
RMSnow/Vevo2
Updated
Sep 8
•
2
OuteAI/Llama-OuteTTS-1.0-1B
Text-to-Speech
•
1B
•
Updated
Sep 8
•
7.25k
•
235
aoi-ot/VibeVoice-Large
Text-to-Speech
•
9B
•
Updated
Sep 25
•
16.5k
•
204
FreedomIntelligence/EchoX-8B
10B
•
Updated
Sep 19
•
15
•
10
kyutai/tts-0.75b-en-public
Text-to-Speech
•
Updated
Sep 11
•
79.9k
•
10
kyutai/stt-2.6b-en
Automatic Speech Recognition
•
Updated
Jun 26
•
113
kyutai/stt-1b-en_fr
Automatic Speech Recognition
•
Updated
20 days ago
•
102
IndexTeam/IndexTTS-2
Updated
Sep 8
•
21.9k
•
528
openbmb/VoxCPM-0.5B
Text-to-Speech
•
Updated
Sep 19
•
2.43k
•
770
amphion/anyaccomp
Updated
Oct 19
•
13
•
7
FreedomIntelligence/ExpressiveSpeech
Viewer
•
Updated
Oct 24
•
10.8k
•
222
•
9
Qwen/Qwen3-Omni-30B-A3B-Instruct
Any-to-Any
•
35B
•
Updated
Sep 22
•
283k
•
744
Qwen/Qwen3-Omni-30B-A3B-Thinking
Any-to-Any
•
32B
•
Updated
Sep 22
•
50k
•
230
Qwen/Qwen3-Omni-30B-A3B-Captioner
Any-to-Any
•
32B
•
Updated
Sep 22
•
22.1k
•
177
inclusionAI/Ming-UniAudio-16B-A3B-Edit
18B
•
Updated
Oct 2
•
317
•
27
inclusionAI/Ming-UniAudio-16B-A3B
Any-to-Any
•
18B
•
Updated
15 days ago
•
310
•
72
TencentARC/AudioStory-3B
Updated
Sep 30
•
6
•
7
LiquidAI/LFM2-Audio-1.5B
Audio-to-Audio
•
1B
•
Updated
3 days ago
•
2.58k
•
303
inclusionAI/Ming-Lite-Omni-1.5
Any-to-Any
•
19B
•
Updated
Aug 29
•
738
•
81
Linq-AI-Research/FinDER
Viewer
•
Updated
Oct 2
•
5.7k
•
951
•
7
neuphonic/neutts-air
Text-to-Speech
•
0.7B
•
Updated
Oct 10
•
23.1k
•
795
nvidia/diar_streaming_sortformer_4spk-v2
Audio Classification
•
Updated
5 days ago
•
11.8k
•
77
nvidia/audio-flamingo-3-hf
Audio-Text-to-Text
•
8B
•
Updated
9 days ago
•
10.9k
•
133
inclusionAI/Ming-flash-omni-Preview
Any-to-Any
•
104B
•
Updated
Oct 30
•
7.89k
•
64
TMElyralab/MuseTalk
Updated
Mar 31
•
131
disco-eth/sao-instruct
Updated
Oct 28
•
75
•
4
Soul-AILab/SoulX-Podcast-1.7B-dialect
Text-to-Speech
•
2B
•
Updated
Nov 2
•
534
•
24
maya-research/maya1
Text-to-Speech
•
3B
•
Updated
27 days ago
•
71.5k
•
•
809
nvidia/parakeet_realtime_eou_120m-v1
Updated
5 days ago
•
4.24k
•
100
Supertone/supertonic
Text-to-Speech
•
Updated
about 14 hours ago
•
16.3k
•
418
stepfun-ai/Step-Audio-R1
Audio-Text-to-Text
•
33B
•
Updated
7 days ago
•
457
•
123
nari-labs/Dia2-2B
Text-to-Speech
•
Updated
7 days ago
•
10k
•
124
stepfun-ai/Step-Audio-EditX
Text-to-Speech
•
4B
•
Updated
10 days ago
•
1.11k
•
101
facebook/omniASR-CTC-3B
Automatic Speech Recognition
•
Updated
11 days ago
•
2
facebook/omniASR-LLM-1B
Automatic Speech Recognition
•
Updated
10 days ago
•
3
microsoft/VibeVoice-Realtime-0.5B
Text-to-Speech
•
1B
•
Updated
about 15 hours ago
•
40.5k
•
519
Upvote
-
Share collection
View history
Collection guide
Browse collections