ReVoice-2025 β Speech Enhancement Hackathon (Baseline)
This repository represents a baseline (basic solution) for participating in the ReVoice-2025 hackathon. The project is based on the Miipher model and adapted for the competition. We tried to make the code as clean, fast, and convenient as possible.
π Quick Start
1. Environment Setup
Python 3.10.11 is recommended.
git clone https://github.com/mtuciru/ReVoice-2025
cd ReVoice-2025
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
pip install --no-dependencies git+https://github.com/Wataru-Nakata/ssl-vocoders.git
export PYTHONPATH=./src
2. Downloading Pre-trained Weights
The script will automatically download Miipher and HiFiGAN weights to the ./models folder.
python3 scripts/download_weights.py
3. Dataset Preparation
Training the model requires a prepared dataset (clean + noisy audio + phonemes). The script takes your folder with clean audio, adds noise (using the degrader config), and generates phonemes (using GigaAM for transcription if no text is present).
Important: Before running, edit examples/configs/degrader_config.yaml, specifying the path to your noise files (noise_dir parameter etc., if used).
python3 scripts/prepare_dataset.py \
--input_dir /path/to/clean_audio \
--output_dir /path/to/processed_dataset \
--degrader_config examples/configs/degrader_config.yaml
4. Training Configuration
All training settings are located in examples/configs/config.yaml.
Main parameters to check:
data.train_dataset_path: Path to the folder you created in step 3.data.val_dataset_path: Path to the validation set.train.trainer.devices: Number and IDs of GPUs (default1).
5. Starting Training
python3 examples/train.py
6. Monitoring (TensorBoard)
Monitor training progress and metrics:
tensorboard --logdir logs/
7. Inference (Speech Restoration)
To restore speech from noisy files, use the run_miipher.py script. It takes a folder with input files and a folder to save the result.
python3 scripts/run_miipher.py \
--input_dir /path/to/noisy_audio \
--output_dir /path/to/restored_audio \
--lang_code rus \
--miipher_ckpt ./models/miipher.ckpt \
--vocoder_ckpt ./models/hifigan.ckpt
Arguments:
--input_dir: Folder with noisy files (.wav,.mp3,.flac).--output_dir: Folder where restored files will be saved.--lang_code: Language code for phonetization (defaultrus). If text transcripts (.txt) exist, the script will try to find them. Otherwise, ASR (GigaAM) will be used.
8. Quality Evaluation (Metrics)
To calculate metrics (SI-SNR, STOI, MelLoss), use eval.py. The script compares the folder with restored files (hypotheses) and the folder with clean reference files (references).
python3 eval.py \
--hyp_dir /path/to/restored_audio \
--ref_dir /path/to/clean_reference_audio \
--output_csv metrics_results.csv
Arguments:
--hyp_dir: Folder with your restored files.--ref_dir: Folder with clean original files (files must have matching names).--output_csv: Path to save the results table (defaultmetrics_results.csv).
π Project Structure
examples/train.pyβ Main script for starting training.examples/configs/config.yamlβ Configuration for hyperparameters, paths, and the model.run_miipher.pyβ Script for running inference on a folder.eval.pyβ Script for calculating metrics on a folder.scripts/prepare_dataset.pyβ Script for dataset generation (augmentation + phonemization).scripts/download_weights.pyβ Weight downloader.src/miipher/lightning_module.pyβ Training logic (Pytorch Lightning), training step, validation, metrics.src/miipher/datasetβ Data loading logic (Dataset, DataModule).src/miipher/metrics/eval_metrics.pyβ Implementation of SI-SNR, STOI, MelLoss metrics.