streamline audio feature extraction -凯发k8网页登录
streamline audio feature extraction
since r2019b
description
audiofeatureextractor
encapsulates multiple audio feature
extractors into a streamlined and modular implementation.
creation
description
creates an
audio feature extractor with default property values.afe
= audiofeatureextractor()
specifies nondefault properties for afe
= audiofeatureextractor(name=value
)afe
using one or more name-value
arguments.
properties
main properties
window
— analysis window
hamming(1024,"periodic")
(default) | real vector
analysis window, specified as a real vector.
data types: single
| double
overlaplength
— overlap length of adjacent analysis windows
512
(default) | integer in the range [0,
numel(window
)
)
window
)overlap length of adjacent analysis windows, specified as an integer in the range
[0, numel(window)
).
data types: single
| double
fftlength
— fft length
[]
(default) | positive integer
fft length, specified as an integer. the default value of []
means that the fft length is equal to the window length numel(window)
.
data types: single
| double
samplerate
— input sample rate (hz)
44100
(default) | positive scalar
input sample rate in hz, specified as a positive scalar.
data types: single
| double
spectraldescriptorinput
— input to spectral descriptors
"linearspectrum"
(default) | "melspectrum"
| "barkspectrum"
| "erbspectrum"
input to spectral descriptors, specified as "linearspectrum"
,
"melspectrum"
, "barkspectrum"
, or
"erbspectrum"
.
spectral descriptors affected by this property are:
the spectrum input to the spectral descriptors is the same as output from the corresponding feature:
for example, if you set spectraldescriptorinput
to
"barkspectrum"
, and spectralcentroid
to
true
, then afe
returns the centroid of the
default bark
spectrum.
[audioin,fs] = audioread("counting-16-44p1-mono-15secs.wav"); afe = audiofeatureextractor(samplerate=fs, ... spectraldescriptorinput="barkspectrum", ... spectralcentroid=true); barkspectralcentroid = extract(afe,audioin);
barkspectrum
using , then the nondefault bark spectrum is the input
to the spectral descriptors. for example, if you call
setextractorparameters(afe,"barkspectrum",numbands=40)
, then
afe
returns the centroid of a 40-band bark spectrum.
setextractorparameters(afe,"barkspectrum",numbands=40)
bark40spectralcentroid = extract(afe,audioin);
data types: char
| string
featurevectorlength
— number of features output from extract
positive integer
this property is read-only.
total number of features output from extract
for the current
object configuration, specified as a positive integer.
featurevectorlength
is equal to the second dimension of the
output from the
function.
data types: single
| double
features to extract
linearspectrum
— extract linear spectrum
false
(default) | true
extract the one-sided linear spectrum, specified as true
or
false
.
to set parameters of the linear spectrum extraction, use :
setextractorparameters(afe,"linearspectrum",name=value)
frequencyrange
–– frequency range of the extracted spectrum in hz, specified as a two-element vector of increasing numbers in the range [0, samplerate/2]. if unspecified,frequencyrange
defaults to[0,
.samplerate
/2]spectrumtype
–– spectrum type, specified as"power"
or"magnitude"
. if unspecified,spectrumtype
defaults to"power"
.windownormalization
–– apply window normalization, specified astrue
orfalse
. if unspecified,windownormalization
defaults totrue
.
data types: logical
melspectrum
— extract mel spectrum
false
(default) | true
extract the one-sided mel spectrum, specified as true
or
false
.
to set parameters of the mel spectrum extraction, use :
setextractorparameters(afe,"melspectrum",name=value)
frequencyrange
–– frequency range of the extracted spectrum in hz, specified as a two-element vector of increasing numbers in the range [0, samplerate/2]. if unspecified,frequencyrange
defaults to[0,
.samplerate
/2]spectrumtype
–– spectrum type, specified as"power"
or"magnitude"
. if unspecified,spectrumtype
defaults to"power"
.numbands
–– number of mel bands, specified as an integer. if unspecified,numbands
defaults to32
.filterbanknormalization
–– normalization applied to bandpass filters, specified as"bandwidth"
,"area"
, or"none"
. if unspecified,filterbanknormalization
defaults to"bandwidth"
.windownormalization
–– apply window normalization, specified astrue
orfalse
. if unspecified,windownormalization
defaults totrue
.filterbankdesigndomain
–– domain in which the filter bank is designed, specified as either"linear"
or"warped"
. if unspecified,filterbankdesigndomain
defaults to"linear"
.
data types: logical
barkspectrum
— extract bark spectrum
false
(default) | true
extract the one-sided bark spectrum, specified as true
or
false
.
to set parameters of the bark spectrum extraction, use :
setextractorparameters(afe,"barkspectrum",name=value)
frequencyrange
–– frequency range of the extracted spectrum in hz, specified as a two-element vector of increasing numbers in the range [0, samplerate/2]. if unspecified,frequencyrange
defaults to[0,
.samplerate
/2]spectrumtype
–– spectrum type, specified as"power"
or"magnitude"
. if unspecified,spectrumtype
defaults to"power"
.numbands
–– number of bark bands, specified as an integer. if unspecified,numbands
defaults to32
.filterbanknormalization
–– normalization applied to bandpass filters, specified as"bandwidth"
,"area"
, or"none"
. if unspecified,filterbanknormalization
defaults to"bandwidth"
.windownormalization
–– apply window normalization, specified astrue
orfalse
. if unspecified,windownormalization
defaults totrue
.filterbankdesigndomain
–– domain in which the filter bank is designed, specified as either"linear"
or"warped"
. if unspecified,filterbankdesigndomain
defaults to"linear"
.
data types: logical
erbspectrum
— extract erb spectrum
false
(default) | true
extract the one-sided erb spectrum, specified as true
or
false
.
to set parameters of the erb spectrum extraction, use :
setextractorparameters(afe,"erbspectrum",name=value)
frequencyrange
–– frequency range of the extracted spectrum in hz, specified as a two-element vector of increasing numbers in the range [0, samplerate/2]. if unspecified,frequencyrange
defaults to[0,
.samplerate
/2]spectrumtype
–– spectrum type, specified as"power"
or"magnitude"
. if unspecified,spectrumtype
defaults to"power"
.numbands
–– number of erb bands, specified as an integer. if unspecified,numbands
defaults toceil((frequencyrange(2))-
.hz2erb
(frequencyrange(1)))filterbanknormalization
–– normalization applied to bandpass filters, specified as"bandwidth"
,"area"
, or"none"
. if unspecified,filterbanknormalization
defaults to"bandwidth"
.windownormalization
–– apply window normalization, specified astrue
orfalse
. if unspecified,windownormalization
defaults totrue
.
data types: logical
mfcc
— extract mel-frequency cepstral coefficients (mfcc)
false
(default) | true
extract mel-frequency cepstral coefficients (mfcc), specified as
true
or false
.
to set parameters of the mfcc extraction, use :
setextractorparameters(afe,"mfcc",name=value)
numcoeffs
–– number of coefficients returned for each window, specified as a positive integer. if unspecified,numcoeffs
defaults to13
.deltawindowlength
–– delta window length, specified as an odd integer greater than 2. if unspecified,deltawindowlength
defaults to9
. this parameter affects themfccdelta
andmfccdeltadelta
features.rectification
–– type of nonlinear rectification, specified as"log"
or"cubic-root"
.
the mel-frequency cepstral coefficients are calculated using the melspectrum.
data types: logical
mfccdelta
— extract delta of mfcc
false
(default) | true
extract delta of mfcc, specified as true
or
false
.
the delta mfcc is calculated based on the extracted mfcc. parameters set on
mfcc
affect mfccdelta
.
data types: logical
mfccdeltadelta
— extract delta-delta of mfcc
false
(default) | true
extract delta-delta of mfcc, specified as true
or
false
.
the delta-delta mfcc is calculated based on the extracted mfcc. parameters set on
mfcc
affect mfccdeltadelta
.
data types: logical
gtcc
— extract gammatone cepstral coefficients (gtcc)
false
(default) | true
extract gammatone cepstral coefficients (gtcc), specified as
true
or false
.
to set parameters of the gtcc extraction, use :
setextractorparameters(afe,"gtcc",name=value)
numcoeffs
–– number of coefficients returned for each window, specified as a positive integer. if unspecified,numcoeffs
defaults to13
.deltawindowlength
–– delta window length, specified as an odd integer greater than 2. if unspecified,deltawindowlength
defaults to9
. this parameter affects thegtccdelta
andgtccdeltadelta
features.
rectification
–– type of nonlinear rectification, specified as"log"
or"cubic-root"
.
the gammatone cepstral coefficients are calculated using the erbspectrum.
data types: logical
gtccdelta
— extract delta of gtcc
false
(default) | true
extract delta of gtcc, specified as true
or
false
.
the delta gtcc is calculated based on the extracted gtcc. parameters set on
gtcc
affect gtccdelta
.
data types: logical
gtccdeltadelta
— extract delta-delta of gtcc
false
(default) | true
extract delta-delta of gtcc, specified as true
or
false
.
the delta-delta gtcc is calculated based on the extracted gtcc. parameters set on
gtcc
affect gtccdeltadelta
.
data types: logical
spectralcentroid
— extract spectral centroid
false
(default) | true
extract spectral centroid, specified as true
or
false
.
the spectral centroid is calculated on one of the following spectral representations, as specified by the spectraldescriptorinput property:
data types: logical
spectralcrest
— extract spectral crest
false
(default) | true
extract spectral crest, specified as true
or
false
.
the spectral crest is calculated on one of the following spectral representations, as specified by the spectraldescriptorinput property:
data types: logical
spectraldecrease
— extract spectral decrease
false
(default) | true
extract spectral decrease, specified as true
or
false
.
the spectral decrease is calculated on one of the following spectral representations, as specified by the spectraldescriptorinput property:
data types: logical
spectralentropy
— extract spectral entropy
false
(default) | true
extract spectral entropy, specified as true
or
false
.
the spectral entropy is calculated on one of the following spectral representations, as specified by the spectraldescriptorinput property:
data types: logical
spectralflatness
— extract spectral flatness
false
(default) | true
extract spectral flatness, specified as true
or
false
.
the spectral flatness is calculated on one of the following spectral representations, as specified by the spectraldescriptorinput property:
data types: logical
spectralflux
— extract spectral flux
false
(default) | true
extract spectral flux, specified as true
or
false
.
the spectral flux is calculated on one of the following spectral representations, as specified by the spectraldescriptorinput property:
to set parameters of the spectral flux extraction, use :
setextractorparameters(afe,"spectralflux",name=value)
normtype
–– norm type used to calculate the spectral flux, specified as1
or2
. if unspecified,normtype
defaults to2
.
data types: logical
spectralkurtosis
— extract spectral kurtosis
false
(default) | true
extract spectral kurtosis, specified as true
or
false
.
the spectral kurtosis is calculated on one of the following spectral representations, as specified by the spectraldescriptorinput property:
data types: logical
spectralrolloffpoint
— extract spectral rolloff point
false
(default) | true
extract spectral rolloff point, specified as true
or
false
.
the spectral rolloff point is calculated on one of the following spectral representations, as specified by the spectraldescriptorinput property:
to set parameters of the spectral rolloff point extraction, use :
setextractorparameters(afe,"spectralrolloffpoint",name=value)
threshold
–– threshold of the rolloff point, specified as a scalar in the range (0, 1). if unspecified,threshold
defaults to0.95
.
data types: logical
spectralskewness
— extract spectral skewness
false
(default) | true
extract spectral skewness, specified as true
or
false
.
the spectral skewness is calculated on one of the following spectral representations, as specified by the spectraldescriptorinput property:
data types: logical
spectralslope
— extract spectral slope
false
(default) | true
extract spectral slope, specified as true
or
false
.
the spectral slope is calculated on one of the following spectral representations, as specified by the spectraldescriptorinput property:
data types: logical
spectralspread
— extract spectral spread
false
(default) | true
extract spectral spread, specified as true
or
false
.
the spectral spread is calculated on one of the following spectral representations, as specified by the spectraldescriptorinput property:
data types: logical
pitch
— extract pitch
false
(default) | true
extract pitch, specified as true
or
false
.
to set parameters of the pitch extraction, use :
setextractorparameters(afe,"pitch",name=value)
method
–– method used to calculate the pitch, specified as"pef"
,"ncf"
,"cep"
,"lhs"
, or"srh"
. if unspecified,method
defaults to"ncf"
. for a description of available pitch extraction methods, see .range
–– range within to search for the pitch in hz, specified as a two-element row vector of increasing values. if unspecified,range
defaults to[50,400]
.medianfilterlength
–– median filter length used to smooth pitch estimates over time, specified as a positive integer. if unspecified,medianfilterlength
defaults to1
(no median filtering).
data types: logical
harmonicratio
— extract harmonic ratio
false
(default) | true
extract harmonic ratio, specified as true
or
false
.
data types: logical
zerocrossrate
— extract zero-crossing rate
false
(default) | true
extract zero-crossing rate, specified as true
or
false
.
to set parameters of the zero-crossing rate extraction, use :
setextractorparameters(afe,"zerocrossrate",name=value)
method
–– method for computing the zero-crossing rate, specified as"difference"
or"comparison"
. if unspecified,method
, defaults to"difference"
. for more information, see .level
–– signal level for which the crossing rate is computed, specified as a real scalar.audiofeatureextractor
subtracts thelevel
value from the signal and then finds the zero crossings. if unspecified,level
defaults to0
.threshold
–– threshold above and below thelevel
value over which the crossing rate is computed, specified as a real scalar.audiofeatureextractor
sets all the values of the input in the range[–
tothreshold
,threshold
]0
and then finds the zero crossings. if unspecified,threshold
defaults to0
.transitionedge
— transitions to include when counting zero crossings, specified as"falling"
,"rising"
, or"both"
. if you specify"falling"
, only negative-going transitions are counted. if you specify"rising"
, only positive-going transitions are counted. if unspecified,transitionedge
defaults to"both"
.zeropositive
— sign convention, specified as a logical scalar. if you specifyzeropositive
astrue
, then0
is considered positive. if you specifyzeropositive
asfalse
, thenaudiofeatureextractor
considers0
,–1
, and1
to have distinct signs following the convention of the function. if unspecified,zeropositive
defaults tofalse
.
data types: logical
shorttimeenergy
— extract short-time energy
false
(default) | true
extract short-time energy, specified as true
or
false
. the short-time energy is computed using
ste = sum(xbw.^2,1)
,
where xbw
is the buffered and windowed
signal.
example: chirp function
generate a chirp sampled at 1 khz for 3 seconds. the instantaneous frequency is 100 hz at and crosses 200 hz at second. divide the signal into 103-sample segments with 43 samples of overlap between adjoining segments. window each segment with a periodic hamming window.
fs = 1e3; x = chirp(0:1/fs:3,100,1,200)'; win = hamming(103,"periodic"); nover = 43; [xb,~] = buffer(x,length(win),nover,"nodelay"); xbw = xb.*win;
compute the short-time energy using the definition.
edef = sum(xbw.^2,1)';
use audiofeatureextractor
to compute the short-time energy.
eafe = extract(audiofeatureextractor(shorttimeenergy=true, ...
samplerate=fs,window=win,overlaplength=nover),x);
verify that both procedures give the same short-time energy.
dff = max(abs(eafe-edef))
dff = 0
data types: logical
object functions
extract audio features | |
set nondefault parameter values for individual feature extractors | |
output mapping and individual feature extractor parameters | |
create matlab function compatible with c/c code generation | |
plot extracted audio features |
examples
extract multiple audio features
read in an audio signal.
[audioin,fs] = audioread("counting-16-44p1-mono-15secs.wav");
create an audiofeatureextractor
object that extracts the mfcc, delta mfcc, delta-delta mfcc, pitch, spectral centroid, zero-crossing rate, and short-time energy of the signal. use a 30 ms analysis window with 20 ms overlap.
afe = audiofeatureextractor( ... samplerate=fs, ... window=hamming(round(0.03*fs),"periodic"), ... overlaplength=round(0.02*fs), ... mfcc=true, ... mfccdelta=true, ... mfccdeltadelta=true, ... pitch=true, ... spectralcentroid=true, ... zerocrossrate=true, ... shorttimeenergy=true);
call extract
to extract the audio features from the audio signal.
features = extract(afe,audioin);
use info
to determine which column of the feature extraction matrix corresponds to the requested pitch extraction.
idx = info(afe)
idx = struct with fields:
mfcc: [1 2 3 4 5 6 7 8 9 10 11 12 13]
mfccdelta: [14 15 16 17 18 19 20 21 22 23 24 25 26]
mfccdeltadelta: [27 28 29 30 31 32 33 34 35 36 37 38 39]
spectralcentroid: 40
pitch: 41
zerocrossrate: 42
shorttimeenergy: 43
plot the detected pitch over time.
t = linspace(0,size(audioin,1)/fs,size(features,1)); plot(t,features(:,idx.pitch)) title("pitch") xlabel("time (s)") ylabel("frequency (hz)")
plot the zero-crossing rate over time.
plot(t,features(:,idx.zerocrossrate)) title("zero-crossing rate") xlabel("time (s)")
plot the short-time energy over time.
plot(t,features(:,idx.shorttimeenergy)) title("short-time energy") xlabel("time (s)")
extract features from dataset
create an audio datastore that points to audio samples included with audio toolbox®.
folder = fullfile(matlabroot,"toolbox","audio","samples"); ads = audiodatastore(folder);
find all files that correspond to a sample rate of 44.1 khz and then the datastore.
keepfile = cellfun(@(x)contains(x,"44p1"),ads.files);
ads = subset(ads,keepfile);
convert the data to a array. tall
arrays are evaluated only when you request them explicitly using . matlab® automatically optimizes the queued calculations by minimizing the number of passes through the data. if you have parallel computing toolbox™, you can spread the calculations across multiple workers. the audio data is represented as an m-by-1 tall cell array, where m is the number of files in the audio datastore.
adstall = tall(ads)
starting parallel pool (parpool) using the 'local' profile ... connected to the parallel pool (number of workers: 6). adstall = m×1 tall cell array { 539648×1 double} { 227497×1 double} { 8000×1 double} { 685056×1 double} { 882688×2 double} {1115760×2 double} { 505200×2 double} {3195904×2 double} : : : :
create an audiofeatureextractor
object to extract the mel spectrum, bark spectrum, erb spectrum, and linear spectrum from each audio file. use the default analysis window and overlap length for the spectrum extraction.
afe = audiofeatureextractor(samplerate=44.1e3, ... melspectrum=true, ... barkspectrum=true, ... erbspectrum=true, ... linearspectrum=true);
define a function so that audio features are extracted from each cell of the tall array. call to evaluate the tall array.
specstall = cellfun(@(x)extract(afe,x),adstall,uniformoutput=false); specs = gather(specstall);
evaluating tall expression using the parallel pool 'local': - pass 1 of 1: completed in 14 sec evaluation completed in 14 sec
the specs
variable returned from gather is a numfiles-by-1 cell array, where numfiles is the number of files in the datastore. each element of the cell array is a numhops-by-numfeatures-by-numchannels array, where the number of hops and number of channels depends on the length and number of channels of the audio file, and the number of features is the requested number of features from the audio data.
numfiles = numel(specs)
numfiles = 12
[numhops1,numfeaturesfile1,numchanelsfile1] = size(specs{1})
numhops1 = 1053
numfeaturesfile1 = 620
numchanelsfile1 = 1
[numhops2,numfeaturesfile2,numchanelsfile2] = size(specs{2})
numhops2 = 443
numfeaturesfile2 = 620
numchanelsfile2 = 1
visualize extracted audio features
use plotfeatures
to visualize audio features extracted with an audiofeatureextractor
object.
read in an audio signal from a file.
[audioin,fs] = audioread("counting-16-44p1-mono-15secs.wav");
create an audiofeatureextractor
object that extracts the gammatone cepstral coefficients (gtccs) and the delta of the gtccs. set the samplerate
property to the sample rate of the audio signal, and use the default values for the other properties.
afe = audiofeatureextractor(samplerate=fs,gtcc=true,gtccdelta=true);
plot the features extracted from the audio signal.
plotfeatures(afe,audioin)
algorithms
the audiofeatureextractor
creates a feature extraction pipeline based on
your selected features. to reduce computations, audiofeatureextractor
reuses
intermediary representations and outputs some intermediate representations as features.
for example, to create an object that extracts the centroid of the bark spectrum, the flux
of the bark spectrum, the pitch, the harmonic ratio, and the delta-delta of the mfcc, specify
the audiofeatureextractor
as
follows.
afe = audiofeatureextractor( ... spectraldescriptorinput="barkspectrum", ... spectralcentroid=true, ... spectralflux=true, ... pitch=true, ... harmonicratio=true, ... mfccdeltadelta=true)
afe = audiofeatureextractor with properties: properties window: [1024×1 double] overlaplength: 512 samplerate: 44100 fftlength: [] spectraldescriptorinput: 'barkspectrum' enabled features mfccdeltadelta, spectralcentroid, spectralflux, pitch, harmonicratio disabled features linearspectrum, melspectrum, barkspectrum, erbspectrum, mfcc, mfccdelta gtcc, gtccdelta, gtccdeltadelta, spectralcrest, spectraldecrease, spectralentropy spectralflatness, spectralkurtosis, spectralrolloffpoint, spectralskewness, spectralslope, spectralspread to extract a feature, set the corresponding property to true. for example, obj.mfcc = true, adds mfcc to the list of enabled features.
note
because audiofeatureextractor
reuses intermediary representations, the
features output from audiofeatureextractor
might not correspond with the
default configuration of features output by corresponding individual feature
extractors.
extended capabilities
c/c code generation
generate c and c code using matlab® coder™.
usage notes and limitations:
you cannot generate code directly from
audiofeatureextractor
. you can generate c/c code from the function returned by .functions returned by that compute an auditory spectrum (mel, bark, erb) support optimized code generation using single instruction, multiple data (simd) instructions. for more information about simd code generation, see (matlab coder).
zerocrossrate
code generation does not support disabling dynamic memory allocation when the input is multichannel.
gpu arrays
accelerate code by running on a graphics processing unit (gpu) using parallel computing toolbox™.
this function fully supports gpu arrays. for more information, see run matlab functions on a gpu (parallel computing toolbox).
version history
introduced in r2019br2023a: generate optimized c/c code for computing auditory spectrum
functions returned by that compute an auditory spectrum (mel, bark, erb) support optimized c/c code generation using single instruction, multiple data (simd) instructions.
r2022b: visualize extracted features
use the object function to visualize extracted audio features.
r2020b: computation of deltas and delta-deltas
the
function is now used to compute mfccdelta
,
mfccdeltadelta
, gtccdelta
, and
gtccdeltadelta
. the audiodelta
algorithm has a
different startup behavior than the previous algorithm. the default window length used to
compute the deltas has changed from 2
to 9
. a delta
window length of 2
is no longer supported.
see also
| audiodatastore
| audiodataaugmenter
| |
打开示例
您曾对此示例进行过修改。是否要打开带有您的编辑的示例?
matlab 命令
您点击的链接对应于以下 matlab 命令:
请在 matlab 命令行窗口中直接输入以执行命令。web 浏览器不支持 matlab 命令。
select a web site
choose a web site to get translated content where available and see local events and offers. based on your location, we recommend that you select: .
you can also select a web site from the following list:
how to get best site performance
select the china site (in chinese or english) for best site performance. other mathworks country sites are not optimized for visits from your location.
americas
- (español)
- (english)
- (english)
europe
- (english)
- (english)
- (deutsch)
- (español)
- (english)
- (français)
- (english)
- (italiano)
- (english)
- (english)
- (english)
- (deutsch)
- (english)
- (english)
- switzerland
- (english)
asia pacific
- (english)
- (english)
- (english)
- 中国
- (日本語)
- (한국어)