multi-talker-whisper-small-ami

ESPnet checkpoint for serialized-output-training (SOT) multi-talker ASR on the AMI meeting corpus, built on top of openai/whisper-small. Trained to emit a single transcript containing every speaker in FIFO order, separated by a <sc> speaker-change token, with per-speaker Whisper-style timestamps.

Files

File	Purpose
`model.pth`	ESPnet-format weights (479 keys)
`config.yaml`	Model architecture / preprocessor spec
`token_list.txt`	51,865-token Whisper multilingual vocabulary

Usage

This checkpoint is consumed by the egs2/ami/sot_asr1 recipe. After cloning ESPnet and preparing the AMI test data, place the three files under exp/whisper-sot-small-ami/ and run:

./run.sh --released_model exp/whisper-sot-small-ami \
         --whisper_model small \
         --decode_test_sets test

The recipe prints utterance-group cpWER and utterance-group DER under exp/whisper-sot-small-ami/decode_released/test/eval/.

Results (AMI SDM test, beam = 5, temperature = 0)

cpWER (%)

overall	1-spk	2-spk	3-spk	4-spk
27.95	15.36	25.54	38.94	52.44

DER (collar = 0.25 s, %)

overall	1-spk	2-spk	3-spk	4-spk
9.84	1.47	6.99	18.65	29.43

Downloads last month: 8

Model tree for espnet/multi-talker-whisper-small-ami

Base model

openai/whisper-small

Finetuned

(3508)

this model