YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

ReVoice-2025 β€” Speech Enhancement Hackathon (Baseline)

This repository represents a baseline (basic solution) for participating in the ReVoice-2025 hackathon. The project is based on the Miipher model and adapted for the competition. We tried to make the code as clean, fast, and convenient as possible.

πŸš€ Quick Start

1. Environment Setup

Python 3.10.11 is recommended.

git clone https://github.com/mtuciru/ReVoice-2025
cd ReVoice-2025

python3 -m venv venv
source venv/bin/activate

pip install -r requirements.txt
pip install --no-dependencies git+https://github.com/Wataru-Nakata/ssl-vocoders.git

export PYTHONPATH=./src 

2. Downloading Pre-trained Weights

The script will automatically download Miipher and HiFiGAN weights to the ./models folder.

python3 scripts/download_weights.py

3. Dataset Preparation

Training the model requires a prepared dataset (clean + noisy audio + phonemes). The script takes your folder with clean audio, adds noise (using the degrader config), and generates phonemes (using GigaAM for transcription if no text is present).

Important: Before running, edit examples/configs/degrader_config.yaml, specifying the path to your noise files (noise_dir parameter etc., if used).

python3 scripts/prepare_dataset.py \
  --input_dir /path/to/clean_audio \
  --output_dir /path/to/processed_dataset \
  --degrader_config examples/configs/degrader_config.yaml

4. Training Configuration

All training settings are located in examples/configs/config.yaml. Main parameters to check:

  • data.train_dataset_path: Path to the folder you created in step 3.
  • data.val_dataset_path: Path to the validation set.
  • train.trainer.devices: Number and IDs of GPUs (default 1).

5. Starting Training

python3 examples/train.py

6. Monitoring (TensorBoard)

Monitor training progress and metrics:

tensorboard --logdir logs/

7. Inference (Speech Restoration)

To restore speech from noisy files, use the run_miipher.py script. It takes a folder with input files and a folder to save the result.

python3 scripts/run_miipher.py \
  --input_dir /path/to/noisy_audio \
  --output_dir /path/to/restored_audio \
  --lang_code rus \
  --miipher_ckpt ./models/miipher.ckpt \
  --vocoder_ckpt ./models/hifigan.ckpt

Arguments:

  • --input_dir: Folder with noisy files (.wav, .mp3, .flac).
  • --output_dir: Folder where restored files will be saved.
  • --lang_code: Language code for phonetization (default rus). If text transcripts (.txt) exist, the script will try to find them. Otherwise, ASR (GigaAM) will be used.

8. Quality Evaluation (Metrics)

To calculate metrics (SI-SNR, STOI, MelLoss), use eval.py. The script compares the folder with restored files (hypotheses) and the folder with clean reference files (references).

python3 eval.py \
  --hyp_dir /path/to/restored_audio \
  --ref_dir /path/to/clean_reference_audio \
  --output_csv metrics_results.csv

Arguments:

  • --hyp_dir: Folder with your restored files.
  • --ref_dir: Folder with clean original files (files must have matching names).
  • --output_csv: Path to save the results table (default metrics_results.csv).

πŸ“‚ Project Structure

  • examples/train.py β€” Main script for starting training.
  • examples/configs/config.yaml β€” Configuration for hyperparameters, paths, and the model.
  • run_miipher.py β€” Script for running inference on a folder.
  • eval.py β€” Script for calculating metrics on a folder.
  • scripts/prepare_dataset.py β€” Script for dataset generation (augmentation + phonemization).
  • scripts/download_weights.py β€” Weight downloader.
  • src/miipher/lightning_module.py β€” Training logic (Pytorch Lightning), training step, validation, metrics.
  • src/miipher/dataset β€” Data loading logic (Dataset, DataModule).
  • src/miipher/metrics/eval_metrics.py β€” Implementation of SI-SNR, STOI, MelLoss metrics.
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Collection including MTUCI/ru-Miipher