Skip to content
devonbryant edited this page Mar 31, 2014 · 3 revisions

Safe currently provides the following feature extraction algorithms and parameters:

  1. CQT - Constant-Q Transform [1]

[1] Judith C Brown and Miller S Puckette. An efficient algorithm for the calculation of a constant q transform. The Journal of the Acoustical Society of America, 92:2698, 1992.

  • sampleRate - Target (expected) sample rate of audio inputs (default = 44100)
  • stepSize - The step size (number of samples) for the framing function (default = 512)
  • windowType - Windowing function: bartlett, blackman, blackmanHarris, hamming, or hann (default = hann)
  • binsPerOctave - The number of CQT bins/octave (default = 24)
  • maxFreq - Maximum frequency (hz) to look for (default = 12543.854)
  • minFreq - Minimum frequency (hz) to look for (default = 16.351599)
  • threshold - Minimum threshold (default = 0.0054)
  1. MFCC - Mel-Frequency Cepstral Coefficients [2]

[2] Steven Davis and Paul Mermelstein. Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. Acoustics, Speech and Signal Processing, IEEE Transactions on, 28(4):357–366, 1980.

  • sampleRate - Target (expected) sample rate of audio inputs (default = 44100)
  • frameSize - Frame size (number of samples) for the framing fucntion (default = 1024)
  • stepSize - The step size (number of samples) for the framing function (default = 512)
  • windowType - Windowing function: bartlett, blackman, blackmanHarris, hamming, or hann (default = hann)
  • numCoeffs - Number of cepstral coefficients to extract (default = 13)
  • melFilters - Number of mel filter banks to use (default = 40)
  • minFreq - Minimum frequency (hz) for the filter bank (default = 130.0)
  • maxFreq - Maximum frequency (hz) for the filter bank (default = 6854.0)
  1. SpectralShape - A combination of four spectral features: Centroid, Spread, Skewness, and Kurtosis [3]

[3] Olivier Gillet and Ga ̈el Richard. Automatic transcription of drum loops. In Acoustics, Speech, and Signal Processing, 2004. Proceedings.(ICASSP’04). IEEE International Conference on, volume 4, pages iv–269. IEEE, 2004.

  • sampleRate - Target (expected) sample rate of audio inputs (default = 44100)
  • frameSize - Frame size (number of samples) for the framing fucntion (default = 1024)
  • stepSize - The step size (number of samples) for the framing function (default = 512)
  • windowType - Windowing function: bartlett, blackman, blackmanHarris, hamming, or hann (default = hann)
  1. SpectralFlux - Spectral Flux [4]

[4] Simon Dixon. Onset detection revisited. In Proceedings of the 9th International Conference on Digital Audio Effects, volume 120, pages 133–137, 2006.

  • sampleRate - Target (expected) sample rate of audio inputs (default = 44100)
  • frameSize - Frame size (number of samples) for the framing fucntion (default = 1024)
  • stepSize - The step size (number of samples) for the framing function (default = 512)
  • windowType - Windowing function: bartlett, blackman, blackmanHarris, hamming, or hann (default = hann)
  • diffLength - Compares frames space n length apart, 1 = consecutive frames (default = 1)
  1. SpectralOnsets - Spectral Onset Detection [5]

[5] Sebastian Bock, Florian Krebs, and Markus Schedl. Evaluating the online capa- bilities of onset detection methods. In ISMIR, pages 49–54, 2012.

  • sampleRate - Target (expected) sample rate of audio inputs (default = 44100)
  • frameSize - Frame size (number of samples) for the framing fucntion (default = 1024)
  • stepSize - The step size (number of samples) for the framing function (default = 512)
  • windowType - Windowing function: bartlett, blackman, blackmanHarris, hamming, or hann (default = hann)
  • ratio - Minimum activation ratio for windowing function (default = 0.22)
  • threshold - Minimum threshold for peak-picking (default = 2.5)
Clone this wiki locally