Low Resource Audio Codec (LRAC) Challenge 2025

This repository contains the official data preparation tools for the LRAC Challenge.

This repository is a fork of the URGENT 2025 Challenge repository and adapts its data preparation scripts and general structure for our challenge.

The goal of the challenge is to develop an audio codec that can compress speech to a very low bitrate while maintaining the highest possible perceptual quality and intelligibility.

Updates

❗️❗️**[2025-09-01]** Excluded sampling rate from noise and rir scp files for baseline support

❗️❗️**[2025-08-25]** Added lists of files used for the open test set (datafiles/open_testset). Added evaluation data preparation for the baseline recipe.

❗️❗️**[2025-08-06]** First commit containing the data preparation core functionality.

Getting Started

Prerequisites

OS: Linux
Disk Space: At least 1.2 TB of free disk space for datasets.
Dependencies: ffmpeg is required for audio processing.

Installation

Clone the repository:

git clone https://github.com/cisco-open/lrac_data_generation
cd lrac_data_generation

Download and Prepare the Datasets: Run the main preparation script. This script automates the entire process:
- It downloads the original large-scale corpora. The downloaded corpora can be accessed in their compressed form in the directory with the same name as the dataset.
- It selects a high-quality subset using our pre-filtered file lists to ensure data quality.
- It resamples all selected audio to a 24kHz sampling rate for compatibility with the baseline model.
- All final, ready-to-use data is placed in the ./data directory.
```
. ./prepare_espnet_data.sh
```

Data

The datasets used in the challenge can be found under this link: https://lrac.short.gy/datasets

The datasets are automatically handled by the prepare_espnet_data.sh script.

All prepared data will be located in the ./data directory.

License

This project is licensed under the Apache 2.0 License. See the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 78 Commits
datafiles		datafiles
utils		utils
.flake8		.flake8
.gitattributes		.gitattributes
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md
environment.yaml		environment.yaml
prepare_espnet_data.sh		prepare_espnet_data.sh
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Low Resource Audio Codec (LRAC) Challenge 2025

Updates

Table of Contents

Getting Started

Prerequisites

Installation

Data

License

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

cisco-open/lrac_data_generation

Folders and files

Latest commit

History

Repository files navigation

Low Resource Audio Codec (LRAC) Challenge 2025

Updates

Table of Contents

Getting Started

Prerequisites

Installation

Data

License

About

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages