Audio - a diwank Collection

Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

diwank 's Collections

M

world

Med

code

F

search

Vision

Art

K

S1.1

Sam

Audio

thought

Audio

updated 1 day ago

espnet/yodas2

Updated May 15 • 43.6k • 43
Flux9665/BibleMMS

Viewer • Updated Jun 16, 2024 • 736k • 1.76k • 70
google/MusicCaps

Viewer • Updated Mar 8, 2023 • 5.52k • 740 • 143
ShoukanLabs/AniSpeech

Viewer • Updated Jan 29, 2024 • 23.7k • 1.09k • 58
aoxo/text2asmr-uncensored

Preview • Updated Feb 19, 2024 • 63 • 17
google/fleurs

Updated Aug 25, 2024 • 34.7k • 356
phongdtd/youtube_casual_audio

Updated Sep 10, 2024 • 246 • 4
ProgramComputer/voxceleb

Updated Jul 27, 2024 • 4.13k • 97
jhu-clsp/seamless-align

Preview • Updated Jun 2, 2024 • 336 • 13
IVLLab/MultiDialog

Updated Aug 29, 2024 • 1.55k • 27
PetraAI/PetraAI

Updated Sep 14, 2023 • 670 • 21
ReDUB/SoundHarvest

Viewer • Updated Dec 14, 2023 • 2 • 97 • 2
jhu-clsp/seamless-align-expressive

Updated Feb 22, 2024 • 65 • 5
jg583/NSynth

Updated Apr 26, 2024 • 461 • 20
voice-is-cool/voxtube

Viewer • Updated Feb 13, 2024 • 4.46M • 1.22k • 18
google/speech_commands

Updated Jan 18, 2024 • 2.23k • 54
Fhrozen/FSD50k

Preview • Updated May 17 • 5.08k • 9
nvidia/parakeet-tdt-1.1b

Automatic Speech Recognition • Updated 5 days ago • 2.4k • 110
yl4579/StyleTTS2-LibriTTS

Updated Nov 21, 2023 • 54
coqui/XTTS-v2

Text-to-Speech • Updated Dec 11, 2023 • 6.33M • 3.22k
facebook/wav2vec2-large-robust

Updated Nov 5, 2021 • 2.23k • 37
laion/links_to_pocasts_lecture_and_shows_for_tts

Viewer • Updated May 29, 2024 • 331k • 21 • 9
laion/youtube-urls-for-emotional-tts

Viewer • Updated May 21, 2024 • 78.3k • 39 • 3
laion/chirp-v2-dataset

Viewer • Updated Mar 25, 2024 • 64 • 36 • 6
speechcolab/gigaspeech

Viewer • Updated Nov 23, 2023 • 364k • 18.4k • 140
fixie-ai/boolq-audio

Viewer • Updated Jun 12, 2024 • 12.7k • 587 • 7
fixie-ai/soda-audio

Viewer • Updated Jul 24, 2024 • 102k • 148 • 4
amphion/Emilia

Preview • Updated Sep 3 • 217 • 86
google/cvss

Updated Feb 10, 2024 • 139 • 15
PolyAI/minds14

Viewer • Updated Aug 12 • 16.3k • 8.08k • 93
Qwen/Qwen2-Audio-7B-Instruct

Audio-Text-to-Text • 8B • Updated Jan 12 • 206k • 499
infgrad/dialogue_rewrite_llm

Viewer • Updated Feb 17, 2024 • 1.64M • 41 • 16
FBK-MT/Speech-MASSIVE

Viewer • Updated Oct 7 • 97.6k • 2.39k • 46
Qwen/Qwen2-Audio-7B

Audio-Text-to-Text • 8B • Updated Nov 20, 2024 • 46.9k • 155
Mozilla/whisperfile

Updated Oct 2, 2024 • 1.37k • 255
vucinatim/spectrogram-captions

Viewer • Updated Jan 3, 2023 • 1k • 86 • 4
rachit8562/mel_spectogram_bird_audio

Viewer • Updated Jan 7, 2023 • 72.2k • 71 • 2
novateur/WavTokenizer

Text-to-Speech • Updated Dec 2, 2024 • 54
gpt-omni/mini-omni

Text-to-Speech • Updated Sep 4, 2024 • 1 • 435
amphion/Emilia-Dataset

Viewer • Updated Feb 28 • 54.8M • 89.7k • 404
FLUX that Plays Music

Paper • 2409.00587 • Published Sep 1, 2024 • 33
feizhengcong/FluxMusic

Updated Nov 22, 2024 • 67
fishaudio/fish-speech-1.4

Text-to-Speech • Updated Nov 5, 2024 • 246 • 453
ICTNLP/Llama-3.1-8B-Omni

9B • Updated Nov 14, 2024 • 107 • 415
HuggingFaceFV/finevideo

Viewer • Updated Dec 16, 2024 • 39.5k • 11.5k • 331
kyutai/moshiko-pytorch-bf16

Updated Sep 18, 2024 • 330k • 190
kyutai/moshika-pytorch-bf16

Updated Sep 18, 2024 • 564 • 58
Revai/reverb-asr

Automatic Speech Recognition • Updated Dec 9, 2024 • 11 • 92
FBK-MT/mosel

Viewer • Updated Oct 7 • 2.2M • 3.52k • 85
Menlo/llama3-s-instruct-v0.2

8B • Updated Aug 23, 2024 • 18 • 45
SWivid/F5-TTS

Text-to-Speech • Updated Mar 21 • 793k • 1.13k
mit-han-lab/hart-0.7b-1024px

Unconditional Image Generation • Updated Nov 17, 2024 • 13
zai-org/glm-4-voice-9b

10B • Updated Oct 25, 2024 • 1.15k • 109
amphion/MaskGCT

Text-to-Speech • Updated Apr 13 • 573 • 301
nvidia/parakeet-tdt_ctc-110m

Automatic Speech Recognition • Updated Feb 18 • 79.7k • 36
nvidia/audio-flamingo

Updated Oct 2, 2024 • 27
fishaudio/fish-agent-v0.1-3b

Audio-to-Audio • Updated Nov 1, 2024 • 23 • 266
OuteAI/OuteTTS-0.1-350M

Text-to-Speech • 0.4B • Updated Apr 17 • 1.2k • 302
adamo1139/Meta_Spirit-LM-ungated

Text-to-Audio • Updated Oct 20, 2024 • 18
si-pbc/hertz-dev

Audio-to-Audio • Updated Nov 14, 2024 • 215
pyannote/speech-separation-ami-1.0

Updated Nov 11, 2024 • 3.82k • 69
nyuuzyou/suno

Preview • Updated Nov 20, 2024 • 96 • 73
gpt-omni/mini-omni2

Any-to-Any • Updated Oct 24, 2024 • 131 • 279
fixie-ai/ultravox-v0_4_1-llama-3_1-70b

Audio-Text-to-Text • 58.7M • Updated May 6 • 34 • 24
aiola/whisper-ner-tag-and-mask-v1

Automatic Speech Recognition • 2B • Updated Nov 21, 2024 • 17 • 6
nyrahealth/CrisperWhisper

Automatic Speech Recognition • 2B • Updated Dec 19, 2024 • 92.7k • 319
laion/laions_got_talent

Viewer • Updated Jan 5 • 461k • 13.6k • 39
nvidia/se_den_sb_16k_small

Updated Nov 28, 2024 • 2
nvidia/se_der_sb_16k_small

Updated Nov 28, 2024 • 3
nvidia/sr_ssl_flowmatching_16k_430m

Updated Nov 28, 2024 • 8
nvidia/low-frame-rate-speech-codec-22khz

Feature Extraction • Updated Aug 5 • 153 • 19
laion/laion-audio-preview

Viewer • Updated Dec 4, 2024 • 4.15M • 3.05k • 11
NexaAI/OmniAudio-2.6B

Audio-Text-to-Text • 3B • Updated Dec 13, 2024 • 1.2k • 281
laion/LAION-Audio-300M

Viewer • Updated Jan 10 • 229M • 37.8k • 47
hexgrad/Kokoro-82M

Text-to-Speech • Updated Apr 10 • 4.13M • • 5.37k
ByteDance/Make-An-Audio-2

Updated May 22, 2024 • 14
tincans-ai/pause-asr-alpha

Automatic Speech Recognition • 94.4M • Updated Sep 17, 2024 • 13 • 6
nvidia/bigvgan_v2_44khz_128band_512x

Audio-to-Audio • Updated Sep 5, 2024 • 491k • 60
speechbrain/sepformer-wham

Audio-to-Audio • Updated Feb 19, 2024 • 410 • 44
blaise-tk/TITAN

Audio-to-Audio • Updated Aug 19, 2024 • 25 • 65
ResembleAI/resemble-enhance

Audio-to-Audio • Updated Dec 21, 2023 • 170
declare-lab/TangoFlux

Text-to-Audio • Updated May 7 • 516 • 100
declare-lab/tango-full

Text-to-Audio • Updated Jun 10, 2024 • 21 • 12
declare-lab/mustango

Text-to-Audio • Updated Dec 17, 2023 • 10.7k • 41
declare-lab/tango2

Text-to-Audio • Updated Apr 16, 2024 • 36 • 18
declare-lab/tango2-full

Text-to-Audio • Updated Dec 29, 2024 • 23 • 11
HKUSTAudio/Llasa-3B

Text-to-Speech • 4B • Updated May 10 • 1.71k • 522
fixie-ai/ultravox-v0_4_1-llama-3_3-70b

Audio-Text-to-Text • 58.7M • Updated May 6 • 20 • 11
UsefulSensors/moonshine-base

Automatic Speech Recognition • 61.5M • Updated Jan 30 • 7.1k • 35
UsefulSensors/moonshine

Automatic Speech Recognition • Updated 9 days ago • 82
laion/laions_got_talent_raw

Viewer • Updated Jan 13 • 59k • 423 • 6
HKUSTAudio/Llasa-8B

Text-to-Speech • 9B • Updated Mar 9 • 973 • 96
baichuan-inc/Baichuan-Omni-1d5

11B • Updated Feb 8 • 80 • 46
m-a-p/YuE-s1-7B-anneal-en-icl

Text Generation • 6B • Updated Mar 12 • 1.1k • 52
m-a-p/YuE-s1-7B-anneal-en-cot

Text Generation • 6B • Updated Mar 12 • 19k • 434
unlimitedbytes/hailuo-ai-voices

Viewer • Updated Jan 19 • 68k • 337 • 6
m-a-p/YuE-s2-1B-general

Text Generation • 2B • Updated Mar 12 • 6.49k • 57
Zyphra/Zonos-v0.1-speaker-embedding

Updated Feb 12 • 28
Zyphra/Zonos-v0.1-hybrid

Text-to-Speech • Updated Jun 3 • 42k • 1.1k
FunAudioLLM/InspireMusic-1.5B-24kHz

2B • Updated Mar 28 • 16 • 6
jadechoghari/VoiceRestore

Audio-to-Audio • Updated Oct 2, 2024 • 52 • 44
stepfun-ai/Step-Audio-Tokenizer

Updated Feb 18 • 48
stepfun-ai/Step-Audio-TTS-3B

Text-to-Speech • 4B • Updated Feb 17 • 192 • 192
stepfun-ai/Step-Audio-Chat

Audio-Text-to-Text • 132B • Updated Feb 17 • 145 • 458
Felguk/Felguk-omni-v0

Audio-Text-to-Text • Updated Jan 19 • 15 • 2
livekit/turn-detector

Text Generation • 0.1B • Updated Dec 12, 2024 • 71.3k • 83
facebook/jasco-chords-drums-melody-1B

Updated Mar 13 • 11
HKUSTAudio/Spark-TTS-0.5B

Text-to-Speech • Updated Mar 7 • 5 • 6
ASLP-lab/DiffRhythm-base

Updated Mar 26 • 45 • 171
SparkAudio/Spark-TTS-0.5B

Text-to-Speech • Updated Mar 7 • 984 • 708
nvidia/audio-flamingo-2-0.5B

Audio-Text-to-Text • Updated Jun 25 • 13
sesame/csm-1b

Text-to-Speech • Updated 7 days ago • 25.9k • 2.28k
kyutai/mimi

Feature Extraction • 96.2M • Updated Jul 2 • 410k • • 270
Roblox/voice-safety-classifier

Audio Classification • 94.6M • Updated Jul 8, 2024 • 193 • 40
canopylabs/orpheus-3b-0.1-pretrained

Text-to-Speech • 4B • Updated Mar 19 • 5.72k • • 159
ibm-granite/granite-speech-3.2-8b

Automatic Speech Recognition • 8B • Updated Apr 16 • 123 • 84
ByteDance/MegaTTS3

Text-to-Speech • Updated Apr 4 • 181 • 412
amphion/Vevo

Text-to-Speech • Updated Apr 13 • 27 • 42
amphion/Vevo1.5

Updated Apr 13 • 52 • 22
kyutai/DailyTalkContiguous

Preview • Updated Mar 24 • 17.5k • 18
nvidia/parakeet-tdt-0.6b-v2

Automatic Speech Recognition • Updated 11 days ago • 670k • 1.38k
ibm-granite/granite-speech-3.3-8b

Automatic Speech Recognition • 9B • Updated Aug 19 • 67.7k • 146
ICTNLP/SLED-TTS-Streaming-Libriheavy

Text-to-Speech • 0.2B • Updated Jun 17 • 20 • 7
ACE-Step/ACE-Step-v1-3.5B

Text-to-Audio • Updated May 22 • 641
VITA-MLLM/VITA-Audio-Plus-Vanilla

8B • Updated May 6 • 209 • 5
ICTNLP/InstructS2S-200K

Viewer • Updated 8 days ago • 200k • 1.29k • 8
ICTNLP/LLaMA-Omni2-14B

16B • Updated May 19 • 9 • 1
laion/empathic-insights-voice

Updated May 18 • 495 • 1
disco-eth/EuroSpeech

Viewer • Updated Sep 29 • 8.42M • 22k • 91
TEN-framework/ten-vad

Updated Jul 9 • 33 • 119
TEN-framework/TEN_Turn_Detection

Text Generation • 8B • Updated May 27 • 9.89k • 57
open-r1/Mixture-of-Thoughts

Viewer • Updated May 26 • 699k • 5k • 290
fishaudio/openaudio-s1-mini

Text-to-Speech • Updated Jun 2 • 3.64k • 526
tencent/SongGeneration

Text-to-Audio • Updated Oct 23 • 598 • 269
kyutai/stt-2.6b-en-trfs

Automatic Speech Recognition • 3B • Updated Jun 26 • 3.01k • 10
nvidia/canary-qwen-2.5b

Automatic Speech Recognition • 3B • Updated 5 days ago • 10.6k • 314
mistralai/Voxtral-Mini-3B-2507

5B • Updated Jul 28 • 527k • 596
bosonai/higgs-audio-v2-generation-3B-base

Text-to-Speech • 6B • Updated Jul 28 • 158k • 644
stepfun-ai/Step-Audio-AQAA

137B • Updated Jun 12 • 62 • 46
nvidia/audio-flamingo-3-chat

Updated Jul 14 • 119 • 43
nvidia/audio-flamingo-3

Audio-Text-to-Text • Updated 10 days ago • 987 • 135
KittenML/kitten-tts-nano-0.1

Updated Aug 30 • 26.7k • 490
FunAudioLLM/CosyVoice-300M-Instruct

Updated Aug 11 • 2
nvidia/canary-1b-v2

Automatic Speech Recognition • Updated 5 days ago • 106k • 312
Vyvo/VyvoTTS-LFM2-Multi-Speaker

0.4B • Updated Aug 15 • 10 • 6
microsoft/VibeVoice-1.5B

Text-to-Speech • 3B • Updated Sep 1 • 341k • 2.05k
allenai/OLMoASR

Audio-Text-to-Text • Updated Aug 28 • 69
tencent/HunyuanVideo-Foley

Text-to-Audio • Updated Sep 29 • 413 • 147
stepfun-ai/Step-Audio-2-mini

Any-to-Any • 8B • Updated Sep 5 • 1.34k • 239
UsefulSensors/moonshine-tiny

Automatic Speech Recognition • 27.1M • Updated Jan 30 • 10.2k • 21
RMSnow/Vevo2

Updated Sep 8 • 2
OuteAI/Llama-OuteTTS-1.0-1B

Text-to-Speech • 1B • Updated Sep 8 • 7.25k • 235
aoi-ot/VibeVoice-Large

Text-to-Speech • 9B • Updated Sep 25 • 16.5k • 204
FreedomIntelligence/EchoX-8B

10B • Updated Sep 19 • 15 • 10
kyutai/tts-0.75b-en-public

Text-to-Speech • Updated Sep 11 • 79.9k • 10
kyutai/stt-2.6b-en

Automatic Speech Recognition • Updated Jun 26 • 113
kyutai/stt-1b-en_fr

Automatic Speech Recognition • Updated 20 days ago • 102
IndexTeam/IndexTTS-2

Updated Sep 8 • 21.9k • 528
openbmb/VoxCPM-0.5B

Text-to-Speech • Updated Sep 19 • 2.43k • 770
amphion/anyaccomp

Updated Oct 19 • 13 • 7
FreedomIntelligence/ExpressiveSpeech

Viewer • Updated Oct 24 • 10.8k • 222 • 9
Qwen/Qwen3-Omni-30B-A3B-Instruct

Any-to-Any • 35B • Updated Sep 22 • 283k • 744
Qwen/Qwen3-Omni-30B-A3B-Thinking

Any-to-Any • 32B • Updated Sep 22 • 50k • 230
Qwen/Qwen3-Omni-30B-A3B-Captioner

Any-to-Any • 32B • Updated Sep 22 • 22.1k • 177
inclusionAI/Ming-UniAudio-16B-A3B-Edit

18B • Updated Oct 2 • 317 • 27
inclusionAI/Ming-UniAudio-16B-A3B

Any-to-Any • 18B • Updated 15 days ago • 310 • 72
TencentARC/AudioStory-3B

Updated Sep 30 • 6 • 7
LiquidAI/LFM2-Audio-1.5B

Audio-to-Audio • 1B • Updated 3 days ago • 2.58k • 303
inclusionAI/Ming-Lite-Omni-1.5

Any-to-Any • 19B • Updated Aug 29 • 738 • 81
Linq-AI-Research/FinDER

Viewer • Updated Oct 2 • 5.7k • 951 • 7
neuphonic/neutts-air

Text-to-Speech • 0.7B • Updated Oct 10 • 23.1k • 795
nvidia/diar_streaming_sortformer_4spk-v2

Audio Classification • Updated 5 days ago • 11.8k • 77
nvidia/audio-flamingo-3-hf

Audio-Text-to-Text • 8B • Updated 9 days ago • 10.9k • 133
inclusionAI/Ming-flash-omni-Preview

Any-to-Any • 104B • Updated Oct 30 • 7.89k • 64
TMElyralab/MuseTalk

Updated Mar 31 • 131
disco-eth/sao-instruct

Updated Oct 28 • 75 • 4
Soul-AILab/SoulX-Podcast-1.7B-dialect

Text-to-Speech • 2B • Updated Nov 2 • 534 • 24
maya-research/maya1

Text-to-Speech • 3B • Updated 27 days ago • 71.5k • • 809
nvidia/parakeet_realtime_eou_120m-v1

Updated 5 days ago • 4.24k • 100
Supertone/supertonic

Text-to-Speech • Updated about 14 hours ago • 16.3k • 418
stepfun-ai/Step-Audio-R1

Audio-Text-to-Text • 33B • Updated 7 days ago • 457 • 123
nari-labs/Dia2-2B

Text-to-Speech • Updated 7 days ago • 10k • 124
stepfun-ai/Step-Audio-EditX

Text-to-Speech • 4B • Updated 10 days ago • 1.11k • 101
facebook/omniASR-CTC-3B

Automatic Speech Recognition • Updated 11 days ago • 2
facebook/omniASR-LLM-1B

Automatic Speech Recognition • Updated 10 days ago • 3
microsoft/VibeVoice-Realtime-0.5B

Text-to-Speech • 1B • Updated about 15 hours ago • 40.5k • 519

Collection guide
Browse collections

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs