main content

streamline audio feature extraction -凯发k8网页登录

streamline audio feature extraction

since r2019b

description

audiofeatureextractor encapsulates multiple audio feature extractors into a streamlined and modular implementation.

creation

description

afe = audiofeatureextractor() creates an audio feature extractor with default property values.

example

afe = audiofeatureextractor(name=value) specifies nondefault properties for afe using one or more name-value arguments.

properties

main properties

analysis window, specified as a real vector.

data types: single | double

overlap length of adjacent analysis windows, specified as an integer in the range [0, numel(window)).

data types: single | double

fft length, specified as an integer. the default value of [] means that the fft length is equal to the window length numel(window).

data types: single | double

input sample rate in hz, specified as a positive scalar.

data types: single | double

input to spectral descriptors, specified as "linearspectrum", "melspectrum", "barkspectrum", or "erbspectrum".

spectral descriptors affected by this property are:

the spectrum input to the spectral descriptors is the same as output from the corresponding feature:

for example, if you set spectraldescriptorinput to "barkspectrum", and spectralcentroid to true, then afe returns the centroid of the default bark spectrum.

[audioin,fs] = audioread("counting-16-44p1-mono-15secs.wav");
afe = audiofeatureextractor(samplerate=fs, ...
                            spectraldescriptorinput="barkspectrum", ...
                            spectralcentroid=true);
barkspectralcentroid = extract(afe,audioin);
if you specify a nondefault barkspectrum using , then the nondefault bark spectrum is the input to the spectral descriptors. for example, if you call setextractorparameters(afe,"barkspectrum",numbands=40), then afe returns the centroid of a 40-band bark spectrum.
setextractorparameters(afe,"barkspectrum",numbands=40)
bark40spectralcentroid = extract(afe,audioin);

data types: char | string

this property is read-only.

total number of features output from extract for the current object configuration, specified as a positive integer. featurevectorlength is equal to the second dimension of the output from the function.

data types: single | double

features to extract

extract the one-sided linear spectrum, specified as true or false.

to set parameters of the linear spectrum extraction, use :

setextractorparameters(afe,"linearspectrum",name=value)
settable parameters for the linear spectrum extraction are:
  • frequencyrange –– frequency range of the extracted spectrum in hz, specified as a two-element vector of increasing numbers in the range [0, samplerate/2]. if unspecified, frequencyrange defaults to [0, samplerate/2].

  • spectrumtype –– spectrum type, specified as "power" or "magnitude". if unspecified, spectrumtype defaults to "power".

  • windownormalization –– apply window normalization, specified as true or false. if unspecified, windownormalization defaults to true.

data types: logical

extract the one-sided mel spectrum, specified as true or false.

to set parameters of the mel spectrum extraction, use :

setextractorparameters(afe,"melspectrum",name=value)
settable parameters for the mel spectrum extraction are:
  • frequencyrange –– frequency range of the extracted spectrum in hz, specified as a two-element vector of increasing numbers in the range [0, samplerate/2]. if unspecified, frequencyrange defaults to [0, samplerate/2].

  • spectrumtype –– spectrum type, specified as "power" or "magnitude". if unspecified, spectrumtype defaults to "power".

  • numbands –– number of mel bands, specified as an integer. if unspecified, numbands defaults to 32.

  • filterbanknormalization –– normalization applied to bandpass filters, specified as "bandwidth", "area", or "none". if unspecified, filterbanknormalization defaults to "bandwidth".

  • windownormalization –– apply window normalization, specified as true or false. if unspecified, windownormalization defaults to true.

  • filterbankdesigndomain –– domain in which the filter bank is designed, specified as either "linear" or "warped". if unspecified, filterbankdesigndomain defaults to "linear".

data types: logical

extract the one-sided bark spectrum, specified as true or false.

to set parameters of the bark spectrum extraction, use :

setextractorparameters(afe,"barkspectrum",name=value)
settable parameters for the bark spectrum extraction are:
  • frequencyrange –– frequency range of the extracted spectrum in hz, specified as a two-element vector of increasing numbers in the range [0, samplerate/2]. if unspecified, frequencyrange defaults to [0, samplerate/2].

  • spectrumtype –– spectrum type, specified as "power" or "magnitude". if unspecified, spectrumtype defaults to "power".

  • numbands –– number of bark bands, specified as an integer. if unspecified, numbands defaults to 32.

  • filterbanknormalization –– normalization applied to bandpass filters, specified as "bandwidth", "area", or "none". if unspecified, filterbanknormalization defaults to "bandwidth".

  • windownormalization –– apply window normalization, specified as true or false. if unspecified, windownormalization defaults to true.

  • filterbankdesigndomain –– domain in which the filter bank is designed, specified as either "linear" or "warped". if unspecified, filterbankdesigndomain defaults to "linear".

data types: logical

extract the one-sided erb spectrum, specified as true or false.

to set parameters of the erb spectrum extraction, use :

setextractorparameters(afe,"erbspectrum",name=value)
settable parameters for the erb spectrum extraction are:
  • frequencyrange –– frequency range of the extracted spectrum in hz, specified as a two-element vector of increasing numbers in the range [0, samplerate/2]. if unspecified, frequencyrange defaults to [0, samplerate/2].

  • spectrumtype –– spectrum type, specified as "power" or "magnitude". if unspecified, spectrumtype defaults to "power".

  • numbands –– number of erb bands, specified as an integer. if unspecified, numbands defaults to ceil((frequencyrange(2))-hz2erb(frequencyrange(1))).

  • filterbanknormalization –– normalization applied to bandpass filters, specified as "bandwidth", "area", or "none". if unspecified, filterbanknormalization defaults to "bandwidth".

  • windownormalization –– apply window normalization, specified as true or false. if unspecified, windownormalization defaults to true.

data types: logical

extract mel-frequency cepstral coefficients (mfcc), specified as true or false.

to set parameters of the mfcc extraction, use :

setextractorparameters(afe,"mfcc",name=value)
settable parameters for the mfcc extraction are:
  • numcoeffs –– number of coefficients returned for each window, specified as a positive integer. if unspecified, numcoeffs defaults to 13.

  • deltawindowlength –– delta window length, specified as an odd integer greater than 2. if unspecified, deltawindowlength defaults to 9. this parameter affects the mfccdelta and mfccdeltadelta features.

  • rectification –– type of nonlinear rectification, specified as "log" or "cubic-root".

the mel-frequency cepstral coefficients are calculated using the melspectrum.

data types: logical

extract delta of mfcc, specified as true or false.

the delta mfcc is calculated based on the extracted mfcc. parameters set on mfcc affect mfccdelta.

data types: logical

extract delta-delta of mfcc, specified as true or false.

the delta-delta mfcc is calculated based on the extracted mfcc. parameters set on mfcc affect mfccdeltadelta.

data types: logical

extract gammatone cepstral coefficients (gtcc), specified as true or false.

to set parameters of the gtcc extraction, use :

setextractorparameters(afe,"gtcc",name=value)
settable parameters for the gtcc extraction are:
  • numcoeffs –– number of coefficients returned for each window, specified as a positive integer. if unspecified, numcoeffs defaults to 13.

  • deltawindowlength –– delta window length, specified as an odd integer greater than 2. if unspecified, deltawindowlength defaults to 9. this parameter affects the gtccdelta and gtccdeltadelta features.

  • rectification –– type of nonlinear rectification, specified as "log" or "cubic-root".

the gammatone cepstral coefficients are calculated using the erbspectrum.

data types: logical

extract delta of gtcc, specified as true or false.

the delta gtcc is calculated based on the extracted gtcc. parameters set on gtcc affect gtccdelta.

data types: logical

extract delta-delta of gtcc, specified as true or false.

the delta-delta gtcc is calculated based on the extracted gtcc. parameters set on gtcc affect gtccdeltadelta.

data types: logical

extract spectral centroid, specified as true or false.

the spectral centroid is calculated on one of the following spectral representations, as specified by the spectraldescriptorinput property:

data types: logical

extract spectral crest, specified as true or false.

the spectral crest is calculated on one of the following spectral representations, as specified by the spectraldescriptorinput property:

data types: logical

extract spectral decrease, specified as true or false.

the spectral decrease is calculated on one of the following spectral representations, as specified by the spectraldescriptorinput property:

data types: logical

extract spectral entropy, specified as true or false.

the spectral entropy is calculated on one of the following spectral representations, as specified by the spectraldescriptorinput property:

data types: logical

extract spectral flatness, specified as true or false.

the spectral flatness is calculated on one of the following spectral representations, as specified by the spectraldescriptorinput property:

data types: logical

extract spectral flux, specified as true or false.

the spectral flux is calculated on one of the following spectral representations, as specified by the spectraldescriptorinput property:

to set parameters of the spectral flux extraction, use :

setextractorparameters(afe,"spectralflux",name=value)
settable parameters for the spectral flux extraction are:
  • normtype –– norm type used to calculate the spectral flux, specified as 1 or 2. if unspecified, normtype defaults to 2.

data types: logical

extract spectral kurtosis, specified as true or false.

the spectral kurtosis is calculated on one of the following spectral representations, as specified by the spectraldescriptorinput property:

data types: logical

extract spectral rolloff point, specified as true or false.

the spectral rolloff point is calculated on one of the following spectral representations, as specified by the spectraldescriptorinput property:

to set parameters of the spectral rolloff point extraction, use :

setextractorparameters(afe,"spectralrolloffpoint",name=value)
settable parameters for the spectral flux extraction are:
  • threshold –– threshold of the rolloff point, specified as a scalar in the range (0, 1). if unspecified, threshold defaults to 0.95.

data types: logical

extract spectral skewness, specified as true or false.

the spectral skewness is calculated on one of the following spectral representations, as specified by the spectraldescriptorinput property:

data types: logical

extract spectral slope, specified as true or false.

the spectral slope is calculated on one of the following spectral representations, as specified by the spectraldescriptorinput property:

data types: logical

extract spectral spread, specified as true or false.

the spectral spread is calculated on one of the following spectral representations, as specified by the spectraldescriptorinput property:

data types: logical

extract pitch, specified as true or false.

to set parameters of the pitch extraction, use :

setextractorparameters(afe,"pitch",name=value)
settable parameters for the pitch extraction are:
  • method –– method used to calculate the pitch, specified as "pef", "ncf", "cep", "lhs", or "srh". if unspecified, method defaults to "ncf". for a description of available pitch extraction methods, see .

  • range –– range within to search for the pitch in hz, specified as a two-element row vector of increasing values. if unspecified, range defaults to [50,400].

  • medianfilterlength –– median filter length used to smooth pitch estimates over time, specified as a positive integer. if unspecified, medianfilterlength defaults to 1 (no median filtering).

data types: logical

extract harmonic ratio, specified as true or false.

data types: logical

extract zero-crossing rate, specified as true or false.

to set parameters of the zero-crossing rate extraction, use :

setextractorparameters(afe,"zerocrossrate",name=value)
settable parameters for the zero-crossing rate extraction are:
  • method –– method for computing the zero-crossing rate, specified as "difference" or "comparison". if unspecified, method, defaults to "difference". for more information, see .

  • level –– signal level for which the crossing rate is computed, specified as a real scalar. audiofeatureextractor subtracts the level value from the signal and then finds the zero crossings. if unspecified, level defaults to 0.

  • threshold –– threshold above and below the level value over which the crossing rate is computed, specified as a real scalar. audiofeatureextractor sets all the values of the input in the range [–threshold, threshold] to 0 and then finds the zero crossings. if unspecified, threshold defaults to 0.

  • transitionedge — transitions to include when counting zero crossings, specified as "falling", "rising", or "both". if you specify "falling", only negative-going transitions are counted. if you specify "rising", only positive-going transitions are counted. if unspecified, transitionedge defaults to "both".

  • zeropositive — sign convention, specified as a logical scalar. if you specify zeropositive as true, then 0 is considered positive. if you specify zeropositive as false, then audiofeatureextractor considers 0, –1, and 1 to have distinct signs following the convention of the function. if unspecified, zeropositive defaults to false.

data types: logical

extract short-time energy, specified as true or false. the short-time energy is computed using

ste = sum(xbw.^2,1),

where xbw is the buffered and windowed signal.

example: chirp function

generate a chirp sampled at 1 khz for 3 seconds. the instantaneous frequency is 100 hz at t=0 and crosses 200 hz at t=1 second. divide the signal into 103-sample segments with 43 samples of overlap between adjoining segments. window each segment with a periodic hamming window.

fs = 1e3;
x = chirp(0:1/fs:3,100,1,200)';
win = hamming(103,"periodic");
nover = 43;
[xb,~] = buffer(x,length(win),nover,"nodelay");
xbw = xb.*win;

compute the short-time energy using the definition.

edef = sum(xbw.^2,1)';

use audiofeatureextractor to compute the short-time energy.

eafe = extract(audiofeatureextractor(shorttimeenergy=true, ...
    samplerate=fs,window=win,overlaplength=nover),x);

verify that both procedures give the same short-time energy.

dff = max(abs(eafe-edef))
dff = 0

data types: logical

object functions

extract audio features
set nondefault parameter values for individual feature extractors
output mapping and individual feature extractor parameters
create matlab function compatible with c/c code generation
plot extracted audio features

examples

read in an audio signal.

[audioin,fs] = audioread("counting-16-44p1-mono-15secs.wav");

create an audiofeatureextractor object that extracts the mfcc, delta mfcc, delta-delta mfcc, pitch, spectral centroid, zero-crossing rate, and short-time energy of the signal. use a 30 ms analysis window with 20 ms overlap.

afe = audiofeatureextractor( ...
    samplerate=fs, ...
    window=hamming(round(0.03*fs),"periodic"), ...
    overlaplength=round(0.02*fs), ...
    mfcc=true, ...
    mfccdelta=true, ...
    mfccdeltadelta=true, ...
    pitch=true, ...
    spectralcentroid=true, ...
    zerocrossrate=true, ...
    shorttimeenergy=true);

call extract to extract the audio features from the audio signal.

features = extract(afe,audioin);

use info to determine which column of the feature extraction matrix corresponds to the requested pitch extraction.

idx = info(afe)
idx = struct with fields:
                mfcc: [1 2 3 4 5 6 7 8 9 10 11 12 13]
           mfccdelta: [14 15 16 17 18 19 20 21 22 23 24 25 26]
      mfccdeltadelta: [27 28 29 30 31 32 33 34 35 36 37 38 39]
    spectralcentroid: 40
               pitch: 41
       zerocrossrate: 42
     shorttimeenergy: 43

plot the detected pitch over time.

t = linspace(0,size(audioin,1)/fs,size(features,1));
plot(t,features(:,idx.pitch))
title("pitch")
xlabel("time (s)")
ylabel("frequency (hz)")

figure contains an axes object. the axes object with title pitch, xlabel time (s), ylabel frequency (hz) contains an object of type line.

plot the zero-crossing rate over time.

plot(t,features(:,idx.zerocrossrate))
title("zero-crossing rate")
xlabel("time (s)")

figure contains an axes object. the axes object with title zero-crossing rate, xlabel time (s) contains an object of type line.

plot the short-time energy over time.

plot(t,features(:,idx.shorttimeenergy))
title("short-time energy")
xlabel("time (s)")

figure contains an axes object. the axes object with title short-time energy, xlabel time (s) contains an object of type line.

create an audio datastore that points to audio samples included with audio toolbox®.

folder = fullfile(matlabroot,"toolbox","audio","samples");
ads = audiodatastore(folder);

find all files that correspond to a sample rate of 44.1 khz and then the datastore.

keepfile = cellfun(@(x)contains(x,"44p1"),ads.files);
ads = subset(ads,keepfile);

convert the data to a array. tall arrays are evaluated only when you request them explicitly using . matlab® automatically optimizes the queued calculations by minimizing the number of passes through the data. if you have parallel computing toolbox™, you can spread the calculations across multiple workers. the audio data is represented as an m-by-1 tall cell array, where m is the number of files in the audio datastore.

adstall = tall(ads)
starting parallel pool (parpool) using the 'local' profile ...
connected to the parallel pool (number of workers: 6).
adstall =
  m×1 tall cell array
    { 539648×1 double}
    { 227497×1 double}
    {   8000×1 double}
    { 685056×1 double}
    { 882688×2 double}
    {1115760×2 double}
    { 505200×2 double}
    {3195904×2 double}
        :         :
        :         :

create an audiofeatureextractor object to extract the mel spectrum, bark spectrum, erb spectrum, and linear spectrum from each audio file. use the default analysis window and overlap length for the spectrum extraction.

afe = audiofeatureextractor(samplerate=44.1e3, ...
    melspectrum=true, ...
    barkspectrum=true, ...
    erbspectrum=true, ...
    linearspectrum=true);

define a function so that audio features are extracted from each cell of the tall array. call to evaluate the tall array.

specstall = cellfun(@(x)extract(afe,x),adstall,uniformoutput=false);
specs = gather(specstall);
evaluating tall expression using the parallel pool 'local':
- pass 1 of 1: completed in 14 sec
evaluation completed in 14 sec

the specs variable returned from gather is a numfiles-by-1 cell array, where numfiles is the number of files in the datastore. each element of the cell array is a numhops-by-numfeatures-by-numchannels array, where the number of hops and number of channels depends on the length and number of channels of the audio file, and the number of features is the requested number of features from the audio data.

numfiles = numel(specs)
numfiles = 12
[numhops1,numfeaturesfile1,numchanelsfile1] = size(specs{1})
numhops1 = 1053
numfeaturesfile1 = 620
numchanelsfile1 = 1
[numhops2,numfeaturesfile2,numchanelsfile2] = size(specs{2})
numhops2 = 443
numfeaturesfile2 = 620
numchanelsfile2 = 1

use plotfeatures to visualize audio features extracted with an audiofeatureextractor object.

read in an audio signal from a file.

[audioin,fs] = audioread("counting-16-44p1-mono-15secs.wav");

create an audiofeatureextractor object that extracts the gammatone cepstral coefficients (gtccs) and the delta of the gtccs. set the samplerate property to the sample rate of the audio signal, and use the default values for the other properties.

afe = audiofeatureextractor(samplerate=fs,gtcc=true,gtccdelta=true);

plot the features extracted from the audio signal.

plotfeatures(afe,audioin)

figure audiofeatureextractor contains 2 axes objects and another object of type uipanel. axes object 1 with title gtcc, xlabel time (s), ylabel coefficient contains an object of type image. axes object 2 with title gtcc delta, xlabel time (s), ylabel coefficient contains an object of type image.

algorithms

the audiofeatureextractor creates a feature extraction pipeline based on your selected features. to reduce computations, audiofeatureextractor reuses intermediary representations and outputs some intermediate representations as features.

for example, to create an object that extracts the centroid of the bark spectrum, the flux of the bark spectrum, the pitch, the harmonic ratio, and the delta-delta of the mfcc, specify the audiofeatureextractor as follows.

afe = audiofeatureextractor( ...
     spectraldescriptorinput="barkspectrum", ...
     spectralcentroid=true, ...
     spectralflux=true, ...
     pitch=true, ...
     harmonicratio=true, ...
     mfccdeltadelta=true)
afe = 
  audiofeatureextractor with properties:
   properties
                     window: [1024×1 double]
              overlaplength: 512
                 samplerate: 44100
                  fftlength: []
    spectraldescriptorinput: 'barkspectrum'
   enabled features
     mfccdeltadelta, spectralcentroid, spectralflux, pitch, harmonicratio
   disabled features
     linearspectrum, melspectrum, barkspectrum, erbspectrum, mfcc, mfccdelta
     gtcc, gtccdelta, gtccdeltadelta, spectralcrest, spectraldecrease, spectralentropy
     spectralflatness, spectralkurtosis, spectralrolloffpoint, spectralskewness, spectralslope, spectralspread
   to extract a feature, set the corresponding property to true.
   for example, obj.mfcc = true, adds mfcc to the list of enabled features.
this configuration corresponds to the highlighted feature extraction pipeline.

note

because audiofeatureextractor reuses intermediary representations, the features output from audiofeatureextractor might not correspond with the default configuration of features output by corresponding individual feature extractors.

extended capabilities

version history

introduced in r2019b
网站地图