Skip to content

urgent-challenge/urgent2026_challenge_track1

Repository files navigation

Updates:

❗️❗️[2025-10-13] A simulation-related issue has been corrected in the baseline code. We strongly recommend pulling the latest version to ensure proper dynamic simulation behavior.

⚠️ Please note: When simulating Vorbis codec distortion, the qscale parameter is officially supported in the range [-1, 10], as documented in the FFmpeg Theora/Vorbis Encoding Guide. However, in some versions of torchaudio, setting qscale=0 when using the Vorbis encoder may result in completely silent output. We advise verifying your encoder behavior and avoiding qscale=0 unless explicitly tested.

How to run:

Setup:

# Set up the Python environment
conda create -n urgent2026_baseline_track1 python=3.10 
conda activate urgent2026_baseline_track1

# Install the baseline code
git clone --recursive [email protected]:urgent-challenge/urgent2026_challenge_track1.git
cd urgent2026_challenge_track1
pip install -e ./

Training Set Preparation

Check the script utils/prepare_train_data.sh and set urgent25_path in it with the path to the urgent2025_challenge project in your system. This script assumes you have already run prepare_espnet_data.sh in the urgent2025_challenge project.

bash utils/prepare_train_data.sh

By default, this script will generate a pre-simulated training set in ./data/train_simulation, and a dynamic mixing training set in ./data/train_sources.

Note: The WSJ dataset was included in urgent2025_challenge but has been removed in urgent2026_challenge_track1. You can skip the preparation of WSJ if you do not have the WSJ license.

The first is the pre-simulated data, which has the following form of directory structure:

data/train_simulation/
├── speech_length.scp # Speech duration in number of sample points.
├── spk1.scp # Clean speech file list of ID and audio path.
├── utt2fs  # ID to sampling rate mapping
├── utt2spk # utterance to speaker mapping 
└── wav.scp # Noisy speech file list of ID and audio path.

The pre-simulated dataset can be loaded by the PreSimulatedDataset in the baseline code.

We also provided a DynamicMixingDataset class in the baseline code for loading data in a dynamic mixing manner. The dataset has the following form of directory structure:

data/train_sources
├── noise_scoures.scp # Noise audio ID and audio path.
├── rirs.scp # Room impulse response ID and audio path.
├── source_length.scp # Speech duration in number of sample points.
├── speech_sources.scp # Clean speech ID and audio path.
└── wind_noise_scoures.scp # # Wind noise audio ID and audio path.

Validation Set Preparation

Check the script utils/prepare_validation_data.sh and set urgent25_path in it with the path to the urgent2025_challenge project in your system.

bash utils/prepare_validation_data.sh

By default, this script will generate a simulated validation set in ./data/validation.

Download the pre-simulated dataset

A pre-simulated training and validation dataset is available online at HuggingFace 🤗. Participants can download and use it directly without running the simulation. However, the simulated speech derived from the ESD subset is excluded due to licensing restrictions. You may apply for the license and run the simulation script yourself to obtain it.

Training

Train discriminative baseline SE models:

python baseline_code/train_se.py  --config_file conf/models/BSRNN_baseline.yaml

Train generative FLOW SE models:

python baseline_code/train_se.py  --config_file conf/models/BSRNN_flowse.yaml

Train with dynamic mixing:

Set train_set_dynamic_mixing: True and train_set_path: ./data/train_sources in config files:

python baseline_code/train_se.py  --config_file conf/models/BSRNN_baseline_dm.yaml

If there is an error message prompting Failed to initialize FFmpeg extension. Please make sure FFmpeg has been installed on your machine, and try conda install ffmpeg.

Pretrained models

We have provided pretrained model checkpoints for both BSRNN and BSRNN-Flow. The training data 700h-TBF is a subset of the ICASSP 2026 URGENT challenge, and the detailed description about it can be found in our recent paper.

Model Training Data Download
BSRNN 700h-TBF HuggingFace 🤗
BSRNN-Flow 700h-TBF HuggingFace 🤗

If you have used the above models, we would appreciate your citation of the following paper:

@article{liLessMoreData2025,
	title = {Less is {More}: {Data} {Curation} {Matters} in {Scaling} {Speech} {Enhancement}},
	shorttitle = {Less is {More}},
	url = {http://arxiv.org/abs/2506.23859},
	doi = {10.48550/arXiv.2506.23859},
	urldate = {2025-09-10},
	publisher = {arXiv},
	author = {Li, Chenda and Zhang, Wangyou and Wang, Wei and Scheibler, Robin and Saijo, Kohei and Cornell, Samuele and Fu, Yihui and Sach, Marvin and Ni, Zhaoheng and Kumar, Anurag and Fingscheidt, Tim and Watanabe, Shinji and Qian, Yanmin},
	month = aug,
	year = {2025},
	note = {arXiv:2506.23859 [eess]},
	keywords = {Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing},
	annote = {Comment: Accepted by ASRU2025},
}

Inference:

After downloading the checkpoints we provide or completing your own training, you can run the following script for speech enhancement.

python baseline_code/inference.py --input_scp [path_to_input_scp] --output [output_dir] --ckpt_path [path_to_checkpoint]

Evaluation:

First, install the necessary dependencies for evaluation (Make sure you have finished the setup in previous steps):

pip install pip==24.0 # Some packages need this version.
pip install -r evaluation_metrics/requirements.txt

Check the variables in evaluation_metrics/eval_all.sh and run it:

bash evaluation_metrics/eval_all.sh

About

Official baseline, dataset and evaluation scripts for the ICASSP 2026 URGENT challenge.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published