Existing Features

Safe currently provides the following feature extraction algorithms and parameters:

CQT - Constant-Q Transform [1]

[1] Judith C Brown and Miller S Puckette. An efficient algorithm for the calculation of a constant q transform. The Journal of the Acoustical Society of America, 92:2698, 1992.

sampleRate - Target (expected) sample rate of audio inputs (default = 44100)
stepSize - The step size (number of samples) for the framing function (default = 512)
windowType - Windowing function: bartlett, blackman, blackmanHarris, hamming, or hann (default = hann)
binsPerOctave - The number of CQT bins/octave (default = 24)
maxFreq - Maximum frequency (hz) to look for (default = 12543.854)
minFreq - Minimum frequency (hz) to look for (default = 16.351599)
threshold - Minimum threshold (default = 0.0054)

MFCC - Mel-Frequency Cepstral Coefficients [2]

[2] Steven Davis and Paul Mermelstein. Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. Acoustics, Speech and Signal Processing, IEEE Transactions on, 28(4):357–366, 1980.

sampleRate - Target (expected) sample rate of audio inputs (default = 44100)
frameSize - Frame size (number of samples) for the framing fucntion (default = 1024)
stepSize - The step size (number of samples) for the framing function (default = 512)
windowType - Windowing function: bartlett, blackman, blackmanHarris, hamming, or hann (default = hann)
numCoeffs - Number of cepstral coefficients to extract (default = 13)
melFilters - Number of mel filter banks to use (default = 40)
minFreq - Minimum frequency (hz) for the filter bank (default = 130.0)
maxFreq - Maximum frequency (hz) for the filter bank (default = 6854.0)

SpectralShape - A combination of four spectral features: Centroid, Spread, Skewness, and Kurtosis [3]

[3] Olivier Gillet and Ga ̈el Richard. Automatic transcription of drum loops. In Acoustics, Speech, and Signal Processing, 2004. Proceedings.(ICASSP’04). IEEE International Conference on, volume 4, pages iv–269. IEEE, 2004.

sampleRate - Target (expected) sample rate of audio inputs (default = 44100)
frameSize - Frame size (number of samples) for the framing fucntion (default = 1024)
stepSize - The step size (number of samples) for the framing function (default = 512)
windowType - Windowing function: bartlett, blackman, blackmanHarris, hamming, or hann (default = hann)

SpectralFlux - Spectral Flux [4]

[4] Simon Dixon. Onset detection revisited. In Proceedings of the 9th International Conference on Digital Audio Effects, volume 120, pages 133–137, 2006.

sampleRate - Target (expected) sample rate of audio inputs (default = 44100)
frameSize - Frame size (number of samples) for the framing fucntion (default = 1024)
stepSize - The step size (number of samples) for the framing function (default = 512)
windowType - Windowing function: bartlett, blackman, blackmanHarris, hamming, or hann (default = hann)
diffLength - Compares frames space n length apart, 1 = consecutive frames (default = 1)

SpectralOnsets - Spectral Onset Detection [5]

[5] Sebastian Bock, Florian Krebs, and Markus Schedl. Evaluating the online capa- bilities of onset detection methods. In ISMIR, pages 49–54, 2012.

sampleRate - Target (expected) sample rate of audio inputs (default = 44100)
frameSize - Frame size (number of samples) for the framing fucntion (default = 1024)
stepSize - The step size (number of samples) for the framing function (default = 512)
windowType - Windowing function: bartlett, blackman, blackmanHarris, hamming, or hann (default = hann)
ratio - Minimum activation ratio for windowing function (default = 0.22)
threshold - Minimum threshold for peak-picking (default = 2.5)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Existing Features

Uh oh!

Uh oh!

Clone this wiki locally