streamline audio feature extraction

since r2019b

description

audiofeatureextractor encapsulates multiple audio feature extractors into a streamlined and modular implementation.

creation

syntax

afe = audiofeatureextractor()

afe = audiofeatureextractor(name=value)

description

afe = audiofeatureextractor() creates an audio feature extractor with default property values.

example

afe = audiofeatureextractor(name=value) specifies nondefault properties for afe using one or more name-value arguments.

properties

main properties

`window` — analysis window
`hamming(1024,"periodic")` (default) | real vector

analysis window, specified as a real vector.

data types: single | double

`overlaplength` — overlap length of adjacent analysis windows
`512` (default) | integer in the range [0, `numel(window)`)

overlap length of adjacent analysis windows, specified as an integer in the range [0, numel(window)).

data types: single | double

`fftlength` — fft length
`[]` (default) | positive integer

fft length, specified as an integer. the default value of [] means that the fft length is equal to the window length numel(window).

data types: single | double

`samplerate` — input sample rate (hz)
`44100` (default) | positive scalar

input sample rate in hz, specified as a positive scalar.

data types: single | double

`spectraldescriptorinput` — input to spectral descriptors
`"linearspectrum"` (default) | `"melspectrum"` | `"barkspectrum"` | `"erbspectrum"`

input to spectral descriptors, specified as "linearspectrum", "melspectrum", "barkspectrum", or "erbspectrum".

spectral descriptors affected by this property are:

the spectrum input to the spectral descriptors is the same as output from the corresponding feature:

for example, if you set spectraldescriptorinput to "barkspectrum", and spectralcentroid to true, then afe returns the centroid of the default bark spectrum.

[audioin,fs] = audioread("counting-16-44p1-mono-15secs.wav");
afe = audiofeatureextractor(samplerate=fs, ...
                            spectraldescriptorinput="barkspectrum", ...
                            spectralcentroid=true);
barkspectralcentroid = extract(afe,audioin);

if you specify a nondefault barkspectrum using , then the nondefault bark spectrum is the input to the spectral descriptors. for example, if you call setextractorparameters(afe,"barkspectrum",numbands=40), then afe returns the centroid of a 40-band bark spectrum.

setextractorparameters(afe,"barkspectrum",numbands=40)
bark40spectralcentroid = extract(afe,audioin);

data types: char | string

`featurevectorlength` — number of features output from extract
positive integer

this property is read-only.

total number of features output from extract for the current object configuration, specified as a positive integer. featurevectorlength is equal to the second dimension of the output from the function.

data types: single | double

features to extract

`linearspectrum` — extract linear spectrum
`false` (default) | `true`

extract the one-sided linear spectrum, specified as true or false.

to set parameters of the linear spectrum extraction, use :

setextractorparameters(afe,"linearspectrum",name=value)

settable parameters for the linear spectrum extraction are:

frequencyrange –– frequency range of the extracted spectrum in hz, specified as a two-element vector of increasing numbers in the range [0, samplerate/2]. if unspecified, frequencyrange defaults to [0, samplerate/2].
spectrumtype –– spectrum type, specified as "power" or "magnitude". if unspecified, spectrumtype defaults to "power".
windownormalization –– apply window normalization, specified as true or false. if unspecified, windownormalization defaults to true.

data types: logical

`melspectrum` — extract mel spectrum
`false` (default) | `true`

extract the one-sided mel spectrum, specified as true or false.

to set parameters of the mel spectrum extraction, use :

setextractorparameters(afe,"melspectrum",name=value)

settable parameters for the mel spectrum extraction are:

frequencyrange –– frequency range of the extracted spectrum in hz, specified as a two-element vector of increasing numbers in the range [0, samplerate/2]. if unspecified, frequencyrange defaults to [0, samplerate/2].
spectrumtype –– spectrum type, specified as "power" or "magnitude". if unspecified, spectrumtype defaults to "power".
numbands –– number of mel bands, specified as an integer. if unspecified, numbands defaults to 32.
filterbanknormalization –– normalization applied to bandpass filters, specified as "bandwidth", "area", or "none". if unspecified, filterbanknormalization defaults to "bandwidth".
windownormalization –– apply window normalization, specified as true or false. if unspecified, windownormalization defaults to true.
filterbankdesigndomain –– domain in which the filter bank is designed, specified as either "linear" or "warped". if unspecified, filterbankdesigndomain defaults to "linear".

data types: logical

`barkspectrum` — extract bark spectrum
`false` (default) | `true`

extract the one-sided bark spectrum, specified as true or false.

to set parameters of the bark spectrum extraction, use :

setextractorparameters(afe,"barkspectrum",name=value)

settable parameters for the bark spectrum extraction are:

frequencyrange –– frequency range of the extracted spectrum in hz, specified as a two-element vector of increasing numbers in the range [0, samplerate/2]. if unspecified, frequencyrange defaults to [0, samplerate/2].
spectrumtype –– spectrum type, specified as "power" or "magnitude". if unspecified, spectrumtype defaults to "power".
numbands –– number of bark bands, specified as an integer. if unspecified, numbands defaults to 32.
filterbanknormalization –– normalization applied to bandpass filters, specified as "bandwidth", "area", or "none". if unspecified, filterbanknormalization defaults to "bandwidth".
windownormalization –– apply window normalization, specified as true or false. if unspecified, windownormalization defaults to true.
filterbankdesigndomain –– domain in which the filter bank is designed, specified as either "linear" or "warped". if unspecified, filterbankdesigndomain defaults to "linear".

data types: logical

`erbspectrum` — extract erb spectrum
`false` (default) | `true`

extract the one-sided erb spectrum, specified as true or false.

to set parameters of the erb spectrum extraction, use :

setextractorparameters(afe,"erbspectrum",name=value)

settable parameters for the erb spectrum extraction are:

frequencyrange –– frequency range of the extracted spectrum in hz, specified as a two-element vector of increasing numbers in the range [0, samplerate/2]. if unspecified, frequencyrange defaults to [0, samplerate/2].
spectrumtype –– spectrum type, specified as "power" or "magnitude". if unspecified, spectrumtype defaults to "power".
numbands –– number of erb bands, specified as an integer. if unspecified, numbands defaults to ceil((frequencyrange(2))-hz2erb(frequencyrange(1))).
filterbanknormalization –– normalization applied to bandpass filters, specified as "bandwidth", "area", or "none". if unspecified, filterbanknormalization defaults to "bandwidth".
windownormalization –– apply window normalization, specified as true or false. if unspecified, windownormalization defaults to true.

data types: logical

`mfcc` — extract mel-frequency cepstral coefficients (mfcc)
`false` (default) | `true`

extract mel-frequency cepstral coefficients (mfcc), specified as true or false.

to set parameters of the mfcc extraction, use :

setextractorparameters(afe,"mfcc",name=value)

settable parameters for the mfcc extraction are:

numcoeffs –– number of coefficients returned for each window, specified as a positive integer. if unspecified, numcoeffs defaults to 13.
deltawindowlength –– delta window length, specified as an odd integer greater than 2. if unspecified, deltawindowlength defaults to 9. this parameter affects the mfccdelta and mfccdeltadelta features.
rectification –– type of nonlinear rectification, specified as "log" or "cubic-root".

the mel-frequency cepstral coefficients are calculated using the melspectrum.

data types: logical

`mfccdelta` — extract delta of mfcc
`false` (default) | `true`

extract delta of mfcc, specified as true or false.

the delta mfcc is calculated based on the extracted mfcc. parameters set on mfcc affect mfccdelta.

data types: logical

`mfccdeltadelta` — extract delta-delta of mfcc
`false` (default) | `true`

extract delta-delta of mfcc, specified as true or false.

the delta-delta mfcc is calculated based on the extracted mfcc. parameters set on mfcc affect mfccdeltadelta.

data types: logical

`gtcc` — extract gammatone cepstral coefficients (gtcc)
`false` (default) | `true`

extract gammatone cepstral coefficients (gtcc), specified as true or false.

to set parameters of the gtcc extraction, use :

setextractorparameters(afe,"gtcc",name=value)

settable parameters for the gtcc extraction are:

numcoeffs –– number of coefficients returned for each window, specified as a positive integer. if unspecified, numcoeffs defaults to 13.
deltawindowlength –– delta window length, specified as an odd integer greater than 2. if unspecified, deltawindowlength defaults to 9. this parameter affects the gtccdelta and gtccdeltadelta features.

rectification –– type of nonlinear rectification, specified as "log" or "cubic-root".

the gammatone cepstral coefficients are calculated using the erbspectrum.

data types: logical

`gtccdelta` — extract delta of gtcc
`false` (default) | `true`

extract delta of gtcc, specified as true or false.

the delta gtcc is calculated based on the extracted gtcc. parameters set on gtcc affect gtccdelta.

data types: logical

`gtccdeltadelta` — extract delta-delta of gtcc
`false` (default) | `true`

extract delta-delta of gtcc, specified as true or false.

the delta-delta gtcc is calculated based on the extracted gtcc. parameters set on gtcc affect gtccdeltadelta.

data types: logical

`spectralcentroid` — extract spectral centroid
`false` (default) | `true`

extract spectral centroid, specified as true or false.

the spectral centroid is calculated on one of the following spectral representations, as specified by the spectraldescriptorinput property:

data types: logical

`spectralcrest` — extract spectral crest
`false` (default) | `true`

extract spectral crest, specified as true or false.

the spectral crest is calculated on one of the following spectral representations, as specified by the spectraldescriptorinput property:

data types: logical

`spectraldecrease` — extract spectral decrease
`false` (default) | `true`

extract spectral decrease, specified as true or false.

the spectral decrease is calculated on one of the following spectral representations, as specified by the spectraldescriptorinput property:

data types: logical

`spectralentropy` — extract spectral entropy
`false` (default) | `true`

extract spectral entropy, specified as true or false.

the spectral entropy is calculated on one of the following spectral representations, as specified by the spectraldescriptorinput property:

data types: logical

`spectralflatness` — extract spectral flatness
`false` (default) | `true`

extract spectral flatness, specified as true or false.

the spectral flatness is calculated on one of the following spectral representations, as specified by the spectraldescriptorinput property:

data types: logical

`spectralflux` — extract spectral flux
`false` (default) | `true`

extract spectral flux, specified as true or false.

the spectral flux is calculated on one of the following spectral representations, as specified by the spectraldescriptorinput property:

to set parameters of the spectral flux extraction, use :

setextractorparameters(afe,"spectralflux",name=value)

settable parameters for the spectral flux extraction are:

normtype –– norm type used to calculate the spectral flux, specified as 1 or 2. if unspecified, normtype defaults to 2.

data types: logical

`spectralkurtosis` — extract spectral kurtosis
`false` (default) | `true`

extract spectral kurtosis, specified as true or false.

the spectral kurtosis is calculated on one of the following spectral representations, as specified by the spectraldescriptorinput property:

data types: logical

`spectralrolloffpoint` — extract spectral rolloff point
`false` (default) | `true`

extract spectral rolloff point, specified as true or false.

the spectral rolloff point is calculated on one of the following spectral representations, as specified by the spectraldescriptorinput property:

to set parameters of the spectral rolloff point extraction, use :

setextractorparameters(afe,"spectralrolloffpoint",name=value)

settable parameters for the spectral flux extraction are:

threshold –– threshold of the rolloff point, specified as a scalar in the range (0, 1). if unspecified, threshold defaults to 0.95.

data types: logical

`spectralskewness` — extract spectral skewness
`false` (default) | `true`

extract spectral skewness, specified as true or false.

the spectral skewness is calculated on one of the following spectral representations, as specified by the spectraldescriptorinput property:

data types: logical

`spectralslope` — extract spectral slope
`false` (default) | `true`

extract spectral slope, specified as true or false.

the spectral slope is calculated on one of the following spectral representations, as specified by the spectraldescriptorinput property:

data types: logical

`spectralspread` — extract spectral spread
`false` (default) | `true`

extract spectral spread, specified as true or false.

the spectral spread is calculated on one of the following spectral representations, as specified by the spectraldescriptorinput property:

data types: logical

`pitch` — extract pitch
`false` (default) | `true`

extract pitch, specified as true or false.

to set parameters of the pitch extraction, use :

setextractorparameters(afe,"pitch",name=value)

settable parameters for the pitch extraction are:

method –– method used to calculate the pitch, specified as "pef", "ncf", "cep", "lhs", or "srh". if unspecified, method defaults to "ncf". for a description of available pitch extraction methods, see .
range –– range within to search for the pitch in hz, specified as a two-element row vector of increasing values. if unspecified, range defaults to [50,400].
medianfilterlength –– median filter length used to smooth pitch estimates over time, specified as a positive integer. if unspecified, medianfilterlength defaults to 1 (no median filtering).

data types: logical

`harmonicratio` — extract harmonic ratio
`false` (default) | `true`

extract harmonic ratio, specified as true or false.

data types: logical

`zerocrossrate` — extract zero-crossing rate
`false` (default) | `true`

extract zero-crossing rate, specified as true or false.

to set parameters of the zero-crossing rate extraction, use :

setextractorparameters(afe,"zerocrossrate",name=value)

settable parameters for the zero-crossing rate extraction are:

method –– method for computing the zero-crossing rate, specified as "difference" or "comparison". if unspecified, method, defaults to "difference". for more information, see .
level –– signal level for which the crossing rate is computed, specified as a real scalar. audiofeatureextractor subtracts the level value from the signal and then finds the zero crossings. if unspecified, level defaults to 0.
threshold –– threshold above and below the level value over which the crossing rate is computed, specified as a real scalar. audiofeatureextractor sets all the values of the input in the range [–threshold, threshold] to 0 and then finds the zero crossings. if unspecified, threshold defaults to 0.
transitionedge — transitions to include when counting zero crossings, specified as "falling", "rising", or "both". if you specify "falling", only negative-going transitions are counted. if you specify "rising", only positive-going transitions are counted. if unspecified, transitionedge defaults to "both".
zeropositive — sign convention, specified as a logical scalar. if you specify zeropositive as true, then 0 is considered positive. if you specify zeropositive as false, then audiofeatureextractor considers 0, –1, and 1 to have distinct signs following the convention of the function. if unspecified, zeropositive defaults to false.

data types: logical

`shorttimeenergy` — extract short-time energy
`false` (default) | `true`

extract short-time energy, specified as true or false. the short-time energy is computed using

ste = sum(xbw.^2,1),

where xbw is the buffered and windowed signal.

example: chirp function

generate a chirp sampled at 1 khz for 3 seconds. the instantaneous frequency is 100 hz at $t = 0$ and crosses 200 hz at $t = 1$ second. divide the signal into 103-sample segments with 43 samples of overlap between adjoining segments. window each segment with a periodic hamming window.

fs = 1e3;
x = chirp(0:1/fs:3,100,1,200)';
win = hamming(103,"periodic");
nover = 43;
[xb,~] = buffer(x,length(win),nover,"nodelay");
xbw = xb.*win;

compute the short-time energy using the definition.

edef = sum(xbw.^2,1)';

use audiofeatureextractor to compute the short-time energy.

eafe = extract(audiofeatureextractor(shorttimeenergy=true, ...
    samplerate=fs,window=win,overlaplength=nover),x);

verify that both procedures give the same short-time energy.

dff = max(abs(eafe-edef))

dff = 0

data types: logical

object functions

	extract audio features
	set nondefault parameter values for individual feature extractors
	output mapping and individual feature extractor parameters
	create matlab function compatible with c/c code generation
	plot extracted audio features

examples

extract multiple audio features

read in an audio signal.

[audioin,fs] = audioread("counting-16-44p1-mono-15secs.wav");

create an audiofeatureextractor object that extracts the mfcc, delta mfcc, delta-delta mfcc, pitch, spectral centroid, zero-crossing rate, and short-time energy of the signal. use a 30 ms analysis window with 20 ms overlap.

afe = audiofeatureextractor( ...
    samplerate=fs, ...
    window=hamming(round(0.03*fs),"periodic"), ...
    overlaplength=round(0.02*fs), ...
    mfcc=true, ...
    mfccdelta=true, ...
    mfccdeltadelta=true, ...
    pitch=true, ...
    spectralcentroid=true, ...
    zerocrossrate=true, ...
    shorttimeenergy=true);

call extract to extract the audio features from the audio signal.

features = extract(afe,audioin);

use info to determine which column of the feature extraction matrix corresponds to the requested pitch extraction.

idx = info(afe)

idx = struct with fields:
                mfcc: [1 2 3 4 5 6 7 8 9 10 11 12 13]
           mfccdelta: [14 15 16 17 18 19 20 21 22 23 24 25 26]
      mfccdeltadelta: [27 28 29 30 31 32 33 34 35 36 37 38 39]
    spectralcentroid: 40
               pitch: 41
       zerocrossrate: 42
     shorttimeenergy: 43

plot the detected pitch over time.

t = linspace(0,size(audioin,1)/fs,size(features,1));
plot(t,features(:,idx.pitch))
title("pitch")
xlabel("time (s)")
ylabel("frequency (hz)")

figure contains an axes object. the axes object with title pitch, xlabel time (s), ylabel frequency (hz) contains an object of type line.

plot the zero-crossing rate over time.

plot(t,features(:,idx.zerocrossrate))
title("zero-crossing rate")
xlabel("time (s)")

figure contains an axes object. the axes object with title zero-crossing rate, xlabel time (s) contains an object of type line.

plot the short-time energy over time.

plot(t,features(:,idx.shorttimeenergy))
title("short-time energy")
xlabel("time (s)")

figure contains an axes object. the axes object with title short-time energy, xlabel time (s) contains an object of type line.

extract features from dataset

create an audio datastore that points to audio samples included with audio toolbox®.

folder = fullfile(matlabroot,"toolbox","audio","samples");
ads = audiodatastore(folder);

find all files that correspond to a sample rate of 44.1 khz and then the datastore.

keepfile = cellfun(@(x)contains(x,"44p1"),ads.files);
ads = subset(ads,keepfile);

convert the data to a array. tall arrays are evaluated only when you request them explicitly using . matlab® automatically optimizes the queued calculations by minimizing the number of passes through the data. if you have parallel computing toolbox™, you can spread the calculations across multiple workers. the audio data is represented as an m-by-1 tall cell array, where m is the number of files in the audio datastore.

adstall = tall(ads)

starting parallel pool (parpool) using the 'local' profile ...
connected to the parallel pool (number of workers: 6).
adstall =
  m×1 tall cell array
    { 539648×1 double}
    { 227497×1 double}
    {   8000×1 double}
    { 685056×1 double}
    { 882688×2 double}
    {1115760×2 double}
    { 505200×2 double}
    {3195904×2 double}
        :         :
        :         :

create an audiofeatureextractor object to extract the mel spectrum, bark spectrum, erb spectrum, and linear spectrum from each audio file. use the default analysis window and overlap length for the spectrum extraction.

afe = audiofeatureextractor(samplerate=44.1e3, ...
    melspectrum=true, ...
    barkspectrum=true, ...
    erbspectrum=true, ...
    linearspectrum=true);

define a function so that audio features are extracted from each cell of the tall array. call to evaluate the tall array.

specstall = cellfun(@(x)extract(afe,x),adstall,uniformoutput=false);
specs = gather(specstall);

evaluating tall expression using the parallel pool 'local':
- pass 1 of 1: completed in 14 sec
evaluation completed in 14 sec

the specs variable returned from gather is a numfiles-by-1 cell array, where numfiles is the number of files in the datastore. each element of the cell array is a numhops-by-numfeatures-by-numchannels array, where the number of hops and number of channels depends on the length and number of channels of the audio file, and the number of features is the requested number of features from the audio data.

numfiles = numel(specs)

numfiles = 12

[numhops1,numfeaturesfile1,numchanelsfile1] = size(specs{1})

numhops1 = 1053

numfeaturesfile1 = 620

numchanelsfile1 = 1

[numhops2,numfeaturesfile2,numchanelsfile2] = size(specs{2})

numhops2 = 443

numfeaturesfile2 = 620

numchanelsfile2 = 1

visualize extracted audio features

use plotfeatures to visualize audio features extracted with an audiofeatureextractor object.

read in an audio signal from a file.

[audioin,fs] = audioread("counting-16-44p1-mono-15secs.wav");

create an audiofeatureextractor object that extracts the gammatone cepstral coefficients (gtccs) and the delta of the gtccs. set the samplerate property to the sample rate of the audio signal, and use the default values for the other properties.

afe = audiofeatureextractor(samplerate=fs,gtcc=true,gtccdelta=true);

plot the features extracted from the audio signal.

plotfeatures(afe,audioin)

figure audiofeatureextractor contains 2 axes objects and another object of type uipanel. axes object 1 with title gtcc, xlabel time (s), ylabel coefficient contains an object of type image. axes object 2 with title gtcc delta, xlabel time (s), ylabel coefficient contains an object of type image.

algorithms

the audiofeatureextractor creates a feature extraction pipeline based on your selected features. to reduce computations, audiofeatureextractor reuses intermediary representations and outputs some intermediate representations as features.

for example, to create an object that extracts the centroid of the bark spectrum, the flux of the bark spectrum, the pitch, the harmonic ratio, and the delta-delta of the mfcc, specify the audiofeatureextractor as follows.

afe = audiofeatureextractor( ...
     spectraldescriptorinput="barkspectrum", ...
     spectralcentroid=true, ...
     spectralflux=true, ...
     pitch=true, ...
     harmonicratio=true, ...
     mfccdeltadelta=true)

afe = 
  audiofeatureextractor with properties:
   properties
                     window: [1024×1 double]
              overlaplength: 512
                 samplerate: 44100
                  fftlength: []
    spectraldescriptorinput: 'barkspectrum'
   enabled features
     mfccdeltadelta, spectralcentroid, spectralflux, pitch, harmonicratio
   disabled features
     linearspectrum, melspectrum, barkspectrum, erbspectrum, mfcc, mfccdelta
     gtcc, gtccdelta, gtccdeltadelta, spectralcrest, spectraldecrease, spectralentropy
     spectralflatness, spectralkurtosis, spectralrolloffpoint, spectralskewness, spectralslope, spectralspread
   to extract a feature, set the corresponding property to true.
   for example, obj.mfcc = true, adds mfcc to the list of enabled features.

this configuration corresponds to the highlighted feature extraction pipeline.

note

because audiofeatureextractor reuses intermediary representations, the features output from audiofeatureextractor might not correspond with the default configuration of features output by corresponding individual feature extractors.

extended capabilities

c/c code generation
generate c and c code using matlab® coder™.

usage notes and limitations:

you cannot generate code directly from audiofeatureextractor. you can generate c/c code from the function returned by .
functions returned by that compute an auditory spectrum (mel, bark, erb) support optimized code generation using single instruction, multiple data (simd) instructions. for more information about simd code generation, see (matlab coder).
zerocrossrate code generation does not support disabling dynamic memory allocation when the input is multichannel.

gpu arrays
accelerate code by running on a graphics processing unit (gpu) using parallel computing toolbox™.

this function fully supports gpu arrays. for more information, see run matlab functions on a gpu (parallel computing toolbox).

version history

introduced in r2019b

r2023a: generate optimized c/c code for computing auditory spectrum

functions returned by that compute an auditory spectrum (mel, bark, erb) support optimized c/c code generation using single instruction, multiple data (simd) instructions.

r2022b: visualize extracted features

use the object function to visualize extracted audio features.

r2020b: computation of deltas and delta-deltas

the function is now used to compute mfccdelta, mfccdeltadelta, gtccdelta, and gtccdeltadelta. the audiodelta algorithm has a different startup behavior than the previous algorithm. the default window length used to compute the deltas has changed from 2 to 9. a delta window length of 2 is no longer supported.

streamline audio feature extraction -凯发k8网页登录

description

creation

syntax

description

properties

main properties

window — analysis window hamming(1024,"periodic") (default) | real vector

overlaplength — overlap length of adjacent analysis windows 512 (default) | integer in the range [0, numel(window))

fftlength — fft length [] (default) | positive integer

samplerate — input sample rate (hz) 44100 (default) | positive scalar

spectraldescriptorinput — input to spectral descriptors "linearspectrum" (default) | "melspectrum" | "barkspectrum" | "erbspectrum"

featurevectorlength — number of features output from extract positive integer

features to extract

linearspectrum — extract linear spectrum false (default) | true

melspectrum — extract mel spectrum false (default) | true

barkspectrum — extract bark spectrum false (default) | true

erbspectrum — extract erb spectrum false (default) | true

mfcc — extract mel-frequency cepstral coefficients (mfcc) false (default) | true

mfccdelta — extract delta of mfcc false (default) | true

mfccdeltadelta — extract delta-delta of mfcc false (default) | true

gtcc — extract gammatone cepstral coefficients (gtcc) false (default) | true

gtccdelta — extract delta of gtcc false (default) | true

gtccdeltadelta — extract delta-delta of gtcc false (default) | true

spectralcentroid — extract spectral centroid false (default) | true

spectralcrest — extract spectral crest false (default) | true

spectraldecrease — extract spectral decrease false (default) | true

spectralentropy — extract spectral entropy false (default) | true

spectralflatness — extract spectral flatness false (default) | true

spectralflux — extract spectral flux false (default) | true

spectralkurtosis — extract spectral kurtosis false (default) | true

spectralrolloffpoint — extract spectral rolloff point false (default) | true

spectralskewness — extract spectral skewness false (default) | true

spectralslope — extract spectral slope false (default) | true

spectralspread — extract spectral spread false (default) | true

pitch — extract pitch false (default) | true

harmonicratio — extract harmonic ratio false (default) | true

zerocrossrate — extract zero-crossing rate false (default) | true

shorttimeenergy — extract short-time energy false (default) | true

example: chirp function

object functions

examples

extract multiple audio features

extract features from dataset

visualize extracted audio features

algorithms

extended capabilities

c/c code generation generate c and c code using matlab® coder™.

gpu arrays accelerate code by running on a graphics processing unit (gpu) using parallel computing toolbox™.

version history

r2023a: generate optimized c/c code for computing auditory spectrum

r2022b: visualize extracted features

r2020b: computation of deltas and delta-deltas

see also

wechat

`window` — analysis window
`hamming(1024,"periodic")` (default) | real vector

`overlaplength` — overlap length of adjacent analysis windows
`512` (default) | integer in the range [0, `numel(window)`)

`fftlength` — fft length
`[]` (default) | positive integer

`samplerate` — input sample rate (hz)
`44100` (default) | positive scalar

`spectraldescriptorinput` — input to spectral descriptors
`"linearspectrum"` (default) | `"melspectrum"` | `"barkspectrum"` | `"erbspectrum"`

`featurevectorlength` — number of features output from extract
positive integer

`linearspectrum` — extract linear spectrum
`false` (default) | `true`

`melspectrum` — extract mel spectrum
`false` (default) | `true`

`barkspectrum` — extract bark spectrum
`false` (default) | `true`

`erbspectrum` — extract erb spectrum
`false` (default) | `true`

`mfcc` — extract mel-frequency cepstral coefficients (mfcc)
`false` (default) | `true`

`mfccdelta` — extract delta of mfcc
`false` (default) | `true`

`mfccdeltadelta` — extract delta-delta of mfcc
`false` (default) | `true`

`gtcc` — extract gammatone cepstral coefficients (gtcc)
`false` (default) | `true`

`gtccdelta` — extract delta of gtcc
`false` (default) | `true`

`gtccdeltadelta` — extract delta-delta of gtcc
`false` (default) | `true`

`spectralcentroid` — extract spectral centroid
`false` (default) | `true`

`spectralcrest` — extract spectral crest
`false` (default) | `true`

`spectraldecrease` — extract spectral decrease
`false` (default) | `true`

`spectralentropy` — extract spectral entropy
`false` (default) | `true`

`spectralflatness` — extract spectral flatness
`false` (default) | `true`

`spectralflux` — extract spectral flux
`false` (default) | `true`

`spectralkurtosis` — extract spectral kurtosis
`false` (default) | `true`

`spectralrolloffpoint` — extract spectral rolloff point
`false` (default) | `true`

`spectralskewness` — extract spectral skewness
`false` (default) | `true`

`spectralslope` — extract spectral slope
`false` (default) | `true`

`spectralspread` — extract spectral spread
`false` (default) | `true`

`pitch` — extract pitch
`false` (default) | `true`

`harmonicratio` — extract harmonic ratio
`false` (default) | `true`

`zerocrossrate` — extract zero-crossing rate
`false` (default) | `true`

`shorttimeenergy` — extract short-time energy
`false` (default) | `true`

c/c code generation
generate c and c code using matlab® coder™.

gpu arrays
accelerate code by running on a graphics processing unit (gpu) using parallel computing toolbox™.