Instructions to use espnet/multi-talker-whisper-small-ami with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- ESPnet
How to use espnet/multi-talker-whisper-small-ami with ESPnet:
from espnet2.bin.asr_inference import Speech2Text model = Speech2Text.from_pretrained( "espnet/multi-talker-whisper-small-ami" ) speech, rate = soundfile.read("speech.wav") text, *_ = model(speech)[0] - Notebooks
- Google Colab
- Kaggle
multi-talker-whisper-small-ami
ESPnet checkpoint for serialized-output-training (SOT) multi-talker ASR on
the AMI meeting corpus, built on top of openai/whisper-small. Trained to
emit a single transcript containing every speaker in FIFO order, separated
by a <sc> speaker-change token, with per-speaker Whisper-style timestamps.
Files
| File | Purpose |
|---|---|
model.pth |
ESPnet-format weights (479 keys) |
config.yaml |
Model architecture / preprocessor spec |
token_list.txt |
51,865-token Whisper multilingual vocabulary |
Usage
This checkpoint is consumed by the
egs2/ami/sot_asr1
recipe. After cloning ESPnet and preparing the AMI test data, place the
three files under exp/whisper-sot-small-ami/ and run:
./run.sh --released_model exp/whisper-sot-small-ami \
--whisper_model small \
--decode_test_sets test
The recipe prints utterance-group cpWER and utterance-group DER under
exp/whisper-sot-small-ami/decode_released/test/eval/.
Results (AMI SDM test, beam = 5, temperature = 0)
cpWER (%)
| overall | 1-spk | 2-spk | 3-spk | 4-spk |
|---|---|---|---|---|
| 27.95 | 15.36 | 25.54 | 38.94 | 52.44 |
DER (collar = 0.25 s, %)
| overall | 1-spk | 2-spk | 3-spk | 4-spk |
|---|---|---|---|---|
| 9.84 | 1.47 | 6.99 | 18.65 | 29.43 |
- Downloads last month
- 8
Model tree for espnet/multi-talker-whisper-small-ami
Base model
openai/whisper-small