multi-talker-whisper-small-ami

ESPnet checkpoint for serialized-output-training (SOT) multi-talker ASR on the AMI meeting corpus, built on top of openai/whisper-small. Trained to emit a single transcript containing every speaker in FIFO order, separated by a <sc> speaker-change token, with per-speaker Whisper-style timestamps.

Files

File Purpose
model.pth ESPnet-format weights (479 keys)
config.yaml Model architecture / preprocessor spec
token_list.txt 51,865-token Whisper multilingual vocabulary

Usage

This checkpoint is consumed by the egs2/ami/sot_asr1 recipe. After cloning ESPnet and preparing the AMI test data, place the three files under exp/whisper-sot-small-ami/ and run:

./run.sh --released_model exp/whisper-sot-small-ami \
         --whisper_model small \
         --decode_test_sets test

The recipe prints utterance-group cpWER and utterance-group DER under exp/whisper-sot-small-ami/decode_released/test/eval/.

Results (AMI SDM test, beam = 5, temperature = 0)

cpWER (%)

overall 1-spk 2-spk 3-spk 4-spk
27.95 15.36 25.54 38.94 52.44

DER (collar = 0.25 s, %)

overall 1-spk 2-spk 3-spk 4-spk
9.84 1.47 6.99 18.65 29.43
Downloads last month
8
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for espnet/multi-talker-whisper-small-ami

Finetuned
(3508)
this model