- Rust-based, with test suites
- Binaries are statically-linked
Download the latest archive from the Releases page.
- wav-files-convert: Convert MP3, FLAC, OGG, M4A, AAC, WMA, AIFF, AU, MP2 to WAV
- wav-files-augment: Create modified audio by adding noise, shifting pitch for WAV files
- wav-files-spectrogram: Generate visual spectrogram images from audio for WAV files
- wav-files-normalize: Adjust audio to a target peak or integrated loudness (LUFS) of WAV files
- wav-files-format: Standardize sample rate, bit depth, and channels of WAV files
- wav-files-validate: Validate file integrity of WAV files in a folder
- wav-files-concat: Concatenate audio files into one file
- wav-files-chunker: Chunk WAV files into smaller pieces
- wav-files-vad: Extract speech using WebRTC VAD for WAV files
- wav-files-vad-api: Extract speech using an API that performs Voice Speech Detection for WAV files
- wav-files-filter: Filter audios by their length
- wav-files-stats: Calculate statistics of a folder with WAV files
- wav-files-trim: Automatically detect and trim silence from the start/end of WAV files, with optional threshold settings
- wav-files-denoise: Apply noise reduction tool nnnoiseless to remove background noise from WAV files
- wav-files-denoise-api: Apply noise reduction API to remove background noise from WAV files
- wav-files-tempo: Adjust playback speed/tempo of WAV files without altering pitch (using time-stretching algorithms like WSOLA)
- wav-files-echo: Add echo, reverb, or chorus effects to WAV files using delay-based DSP parameters (e.g., decay time, wet/dry mix)
The following tools are available in the additional-tools.zip release archive:
- audios-to-dataset: Convert audio files to dataset format
- extract-audio: Extract audio from various sources
- babylonify: Filter parquet files by language
- audio-from-video: Extract audio tracks from video files
- data-viewer-audio: View and inspect audio dataset information
- audio-parquet-merger: Merge multiple audio parquet files
The following external tools are bundled with the toolkit and available in the 3rd-party-bins.zip release archive:
- FFmpeg: A complete, cross-platform solution to record, convert and stream audio and video (static build from johnvansickle.com)
- ffprobe: Multimedia stream analyzer, included with FFmpeg
- parquet-tools: Command-line tools for Apache Parquet files
- nnnoiseless: RNNoise-based noise suppression tool
Open the list
- wav-files-eq: Apply equalization filters to boost/cut specific frequency bands in WAV files. Why? Builds on spectrogram visualization for targeted audio shaping; useful for mastering or voice enhancement.
- wav-files-compress: Apply dynamic range compression to even out loud/soft parts in WAV files, with adjustable ratio/threshold. Why? Pairs with normalization for professional loudness control; prevents clipping in mixed or concatenated files.
- wav-files-metadata: Edit or extract embedded metadata (e.g., artist, title, comments) in WAV files using RIFF chunks. Why? Fills a gap in file handling; integrates with stats and validation for better organization in folders.
- wav-files-waveform: Generate static waveform plot images (PNG/SVG) from WAV files, with customizable styles. Why? Expands visualization beyond spectrograms; quick for previews or reports alongside stats.
- wav-files-fft: Compute and export Fast Fourier Transform (FFT) data as text/CSV for frequency analysis of WAV files. Why? Deeper dive beyond spectrograms for quantitative spectral insights; supports research or automated quality checks.
- wav-files-mix: Overlay or blend multiple WAV files into a single output, with volume balancing and channel mapping (e.g., stereo mixdown)
- wav-files-volume: Adjust overall gain or apply random volume scaling (e.g., ±dB range) to WAV files for dynamic loudness variation. Why for ML? Simulates real-world recording inconsistencies (e.g., microphone distance); pairs with normalization to prevent overfitting in tasks like speaker identification, boosting generalization as seen in torchaudio pipelines.
- wav-files-shift: Perform time-domain shifting by inserting silence or cropping edges to offset audio start/end randomly. Why for ML? Introduces temporal misalignment common in streaming audio; essential for sequence models (e.g., RNNs/LSTMs) in event detection, reducing sensitivity to exact timing as in raw waveform augmentations.
- wav-files-crop: Extract fixed-length random segments (with overlap options) from longer WAV files. Why for ML? Generates variable-length clips for balanced batching in training; critical for fixed-input models like CNNs on audio spectrograms, mimicking dataset imbalances in environmental sound classification.
- wav-files-mask: Apply time-domain masking by zeroing out random contiguous segments (e.g., SpecAugment-inspired on waveform). Why for ML? Encourages models to focus on partial signals, enhancing robustness to occlusions; useful for bioacoustics or music tagging where partial data is common, as in masking strategies for DL.
- wav-files-channel: Swap, drop, or mix stereo channels (e.g., mono conversion with panning) for multi-channel WAV files. Why for ML? Handles channel imbalances in stereo datasets; augments for mono-compatible models, aiding transfer learning in spatial audio tasks like source separation.
- Yehor Smoliakov (contact: [email protected])
@software{Smoliakov_Wav_Files_Toolkit,
  author = {Smoliakov, Yehor},
  month = oct,
  title = {{WAV Files Toolkit: A suite of command-line tools for common WAV audio processing tasks, including conversion from other formats, data augmentation, loudness normalization, spectrogram generation, and validation.}},
  url = {https://github.com/RustedBytes/wav-files-toolkit},
  version = {0.4.0},
  year = {2025}
}