vanshnawander/whisper-small-telugu

This is a fine-tuned version of openai/whisper-small for Telugu automatic speech recognition (ASR).

Model Description

Base Model: openai/whisper-small
Language: Telugu (te)
Task: Automatic Speech Recognition (transcribe)
Training Data: ai4bharat/Kathbath
Fine-tuning Framework: Transformers + Custom DALI Pipeline

Training Details

The model was fine-tuned on the Kathbath Telugu dataset with the following configuration:

Epochs: 3
Batch Size: 16 (effective ~96 with gradient accumulation)
Learning Rate: 1e-5
Mixed Precision: FP16
Gradient Checkpointing: Enabled

Evaluation Results

Evaluated on the Shrutilipi benchmark - a large-scale ASR dataset for Indian languages.

Model	WER	CER	Improvement
Base (openai/whisper-small)	N/A%	N/A%	-
This Model	N/A%	N/A%	N/A%

Usage

Basic Usage

from transformers import WhisperProcessor, WhisperForConditionalGeneration
import librosa

# Load model and processor
processor = WhisperProcessor.from_pretrained("vanshnawander/whisper-small-telugu")
model = WhisperForConditionalGeneration.from_pretrained("vanshnawander/whisper-small-telugu")

# Load audio
audio, sr = librosa.load("audio.wav", sr=16000)

# Transcribe
input_features = processor(audio, sampling_rate=16000, return_tensors="pt").input_features
generated_ids = model.generate(input_features, language="te", task="transcribe")
transcription = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]

print(transcription)

Using Pipeline

from transformers import pipeline

pipe = pipeline(
    "automatic-speech-recognition",
    model="vanshnawander/whisper-small-telugu",
    chunk_length_s=30,
)

result = pipe("audio.wav", generate_kwargs={"language": "te", "task": "transcribe"})
print(result["text"])

Limitations

Optimized for Telugu speech; may not perform well on other languages
Best performance on clear audio with minimal background noise
May struggle with very fast speech or heavy code-mixing

Citation

If you use this model, please cite:

@misc{vanshnawander_whisper_small_telugu},
  author = {AI4Bharat},
  title = {vanshnawander/whisper-small-telugu},
  year = {2024},
  publisher = {HuggingFace},
  url = {https://huggingface.co/vanshnawander/whisper-small-telugu}
}

Acknowledgments

OpenAI Whisper for the base model
AI4Bharat for the Kathbath and Shrutilipi datasets
Hugging Face for the transformers library

Downloads last month: 20

Safetensors

Model size

0.2B params

Tensor type

F32

Model tree for vanshnawander/whisper-small-telugu

Base model

openai/whisper-small

Finetuned

(3161)

this model

Dataset used to train vanshnawander/whisper-small-telugu

Evaluation results

Word Error Rate on Shrutilipi (Telugu)
self-reported

N/A
Character Error Rate on Shrutilipi (Telugu)
self-reported

N/A