-
Notifications
You must be signed in to change notification settings - Fork 1
Existing Features
Safe currently provides the following feature extraction algorithms and parameters:
- CQT - Constant-Q Transform [1]
[1] Judith C Brown and Miller S Puckette. An efficient algorithm for the calculation of a constant q transform. The Journal of the Acoustical Society of America, 92:2698, 1992.
- sampleRate - Target (expected) sample rate of audio inputs (default = 44100)
- stepSize - The step size (number of samples) for the framing function (default = 512)
- windowType - Windowing function: bartlett, blackman, blackmanHarris, hamming, or hann (default = hann)
- binsPerOctave - The number of CQT bins/octave (default = 24)
- maxFreq - Maximum frequency (hz) to look for (default = 12543.854)
- minFreq - Minimum frequency (hz) to look for (default = 16.351599)
- threshold - Minimum threshold (default = 0.0054)
- MFCC - Mel-Frequency Cepstral Coefficients [2]
[2] Steven Davis and Paul Mermelstein. Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. Acoustics, Speech and Signal Processing, IEEE Transactions on, 28(4):357–366, 1980.
- sampleRate - Target (expected) sample rate of audio inputs (default = 44100)
- frameSize - Frame size (number of samples) for the framing fucntion (default = 1024)
- stepSize - The step size (number of samples) for the framing function (default = 512)
- windowType - Windowing function: bartlett, blackman, blackmanHarris, hamming, or hann (default = hann)
- numCoeffs - Number of cepstral coefficients to extract (default = 13)
- melFilters - Number of mel filter banks to use (default = 40)
- minFreq - Minimum frequency (hz) for the filter bank (default = 130.0)
- maxFreq - Maximum frequency (hz) for the filter bank (default = 6854.0)
- SpectralShape - A combination of four spectral features: Centroid, Spread, Skewness, and Kurtosis [3]
[3] Olivier Gillet and Ga ̈el Richard. Automatic transcription of drum loops. In Acoustics, Speech, and Signal Processing, 2004. Proceedings.(ICASSP’04). IEEE International Conference on, volume 4, pages iv–269. IEEE, 2004.
- sampleRate - Target (expected) sample rate of audio inputs (default = 44100)
- frameSize - Frame size (number of samples) for the framing fucntion (default = 1024)
- stepSize - The step size (number of samples) for the framing function (default = 512)
- windowType - Windowing function: bartlett, blackman, blackmanHarris, hamming, or hann (default = hann)
- SpectralFlux - Spectral Flux [4]
[4] Simon Dixon. Onset detection revisited. In Proceedings of the 9th International Conference on Digital Audio Effects, volume 120, pages 133–137, 2006.
- sampleRate - Target (expected) sample rate of audio inputs (default = 44100)
- frameSize - Frame size (number of samples) for the framing fucntion (default = 1024)
- stepSize - The step size (number of samples) for the framing function (default = 512)
- windowType - Windowing function: bartlett, blackman, blackmanHarris, hamming, or hann (default = hann)
- diffLength - Compares frames space n length apart, 1 = consecutive frames (default = 1)
- SpectralOnsets - Spectral Onset Detection [5]
[5] Sebastian Bock, Florian Krebs, and Markus Schedl. Evaluating the online capa- bilities of onset detection methods. In ISMIR, pages 49–54, 2012.
- sampleRate - Target (expected) sample rate of audio inputs (default = 44100)
- frameSize - Frame size (number of samples) for the framing fucntion (default = 1024)
- stepSize - The step size (number of samples) for the framing function (default = 512)
- windowType - Windowing function: bartlett, blackman, blackmanHarris, hamming, or hann (default = hann)
- ratio - Minimum activation ratio for windowing function (default = 0.22)
- threshold - Minimum threshold for peak-picking (default = 2.5)