augment audio data -凯发k8网页登录
augment audio data
since r2019b
description
enlarge your audio dataset using audio-specific augmentation techniques like pitch shifting, time-scale modification, time shifting, noise addition, and volume control. you can create cascaded or parallel augmentation pipelines to apply multiple algorithms deterministically or probabilistically.
creation
description
creates an audio
data augmenter object with default property values.aug
= audiodataaugmenter()
specifies nondefault properties for aug
= audiodataaugmenter(name,value
)aug
using one or more name-value
arguments.
properties
augmentation pipeline
augmentationmode
— augmentation mode
'sequential'
(default) | 'independent'
augmentation mode, specified as 'sequential'
or
'independent'
.
'sequential'
–– augmentation algorithms are applied sequentially (in series).'independent'
–– augmentation algorithms are applied independently (in parallel).
data types: char
| string
augmentationparametersource
— source of augmentation parameters
'random'
(default) | 'specify'
source of augmentation parameters, specified as 'random'
or
'specify'
.
'random'
–– augmentation algorithms are applied probabilistically using a probability parameter and a range parameter.for example, to create an
audiodataaugmenter
that applies time-stretching using a speedup factor between0.5
and1.5
with a 60% probability, enter the following in the command window:when time-stretching is applied, the speedup factor is drawn from a uniform distribution centered at 1 (the mean of the range) with a minimum ofaug = audiodataaugmenter('augmentationparametersource','random', ... 'timestretchprobability',0.6, ... 'speedupfactorrange',[0.5,1.5]);
0.5
and a maximum of1.5
.'specify'
–– augmentation algorithms are applied deterministically using a logical parameter and a specified parameter value. for example, to create anaudiodataaugmenter
that applies time-stretching using a1.5
speedup factor with a 100% probability, enter the following in the command window:aug = audiodataaugmenter('augmentationparametersource','specify', ... 'applytimestretch',true, ... 'speedupfactor',1.5);
data types: char
| string
numaugmentations
— number of augmented signals to output
1
(default) | positive integer
number of augmented signals to output, specified as a positive integer.
dependencies
to enable this property, set augmentationparametersource to 'random'
.
data types: single
| double
| int8
| int16
| int32
| int64
| uint8
| uint16
| uint32
| uint64
stretch time
timestretchprobability
— probability of applying time stretch
0.5
(default) | scalar in the range [0, 1]
probability of applying time stretch, specified as a scalar in the range [0, 1].
set the probability to 1
to apply time stretching every time you
call . set
the probability to 0
to skip time stretching every time you call
.
dependencies
to enable this property, set augmentationparametersource to 'random'
and
augmentationmode to 'sequential'
.
data types: single
| double
speedupfactorrange
— range of time stretch speedup factor
[0.8 1.2]
(default) | two-element row vector of positive nondecreasing values
range of time stretch speedup factor, specified as a two-element row vector of positive nondecreasing values.
dependencies
to enable this property, set augmentationparametersource to 'random'
.
data types: single
| double
| int8
| int16
| int32
| int64
| uint8
| uint16
| uint32
| uint64
applytimestretch
— apply time stretch
true
(default) | false
apply time stretch, specified as true
or
false
.
dependencies
to enable this property, set augmentationparametersource to 'specify'
.
data types: logical
speedupfactor
— time stretch speedup factor
0.8
(default) | real positive scalar | real positive vector
time stretch speedup factor, specified as a scalar or vector of real positive values.
dependencies
to enable this property, set augmentationparametersource to 'specify'
.
data types: single
| double
| int8
| int16
| int32
| int64
| uint8
| uint16
| uint32
| uint64
shift pitch
pitchshiftprobability
— probability of applying pitch shift
0.5
(default) | scalar in the range [0, 1]
probability of applying pitch shift, specified as a scalar in the range [0, 1].
set the probability to 1
to apply pitch shifting every time you
call . set
the probability to 0
to skip pitch shifting every time you call
.
dependencies
to enable this property, set augmentationparametersource to 'random'
and
augmentationmode to 'sequential'
.
data types: single
| double
semitoneshiftrange
— range of pitch shift (semitones)
[-2,2]
(default) | two-element row vector of nondecreasing values
range of pitch shift in semitones, specified as a two-element row vector of nondecreasing values.
dependencies
to enable this property, set augmentationparametersource to 'random'
.
data types: single
| double
| int8
| int16
| int32
| int64
| uint8
| uint16
| uint32
| uint64
applypitchshift
— apply pitch shift
true
(default) | false
apply pitch shift, specified as true
or
false
.
dependencies
to enable this property, set augmentationparametersource to 'specify'
.
data types: logical
semitoneshift
— pitch shift (semitones)
-3
(default) | real scalar | real vector
pitch shift in semitones, specified as a real scalar or vector.
dependencies
to enable this property, set augmentationparametersource to 'specify'
.
data types: single
| double
| int8
| int16
| int32
| int64
| uint8
| uint16
| uint32
| uint64
control volume
volumecontrolprobability
— probability of applying volume control
0.5
(default) | scalar in the range [0, 1]
probability of applying volume control, specified as a scalar in the range [0, 1].
set the probability to 1
to apply volume control every time you
call . set
the probability to 0
to skip volume control every time you call
.
dependencies
to enable this property, set augmentationparametersource to 'random'
and
augmentationmode to 'sequential'
.
data types: single
| double
volumegainrange
— range of volume gain (db)
[-3,3]
(default) | two-element row vector of nondecreasing values
range of volume gain in db, specified as a two-element row vector of nondecreasing values.
dependencies
to enable this property, set augmentationparametersource to 'random'
.
data types: single
| double
| int8
| int16
| int32
| int64
| uint8
| uint16
| uint32
| uint64
applyvolumecontrol
— apply volume gain
true
(default) | false
apply volume gain, specified as true
or
false
.
dependencies
to enable this property, set augmentationparametersource to 'specify'
.
data types: logical
volumegain
— volume gain (db)
-3
(default) | scalar | vector
volume gain in db, specified as a scalar or vector.
data types: single
| double
| int8
| int16
| int32
| int64
| uint8
| uint16
| uint32
| uint64
add noise
addnoiseprobability
— probability of applying noise addition
0.5
(default) | scalar in the range [0, 1]
probability of applying gaussian white noise addition, specified as a scalar in
the range [0, 1]. set the probability to 1
to add noise every time
you call . set
the probability to 0
to skip adding noise every time you call
.
dependencies
to enable this property, set augmentationparametersource to 'random'
and
augmentationmode to 'sequential'
.
data types: single
| double
snrrange
— range of noise addition snr (db)
[0,10]
(default) | two-element row vector of nondecreasing values
range of noise addition snr in db, specified as a two-element row vector of nondecreasing values.
dependencies
to enable this property, set augmentationparametersource to 'range'
.
data types: single
| double
| int8
| int16
| int32
| int64
| uint8
| uint16
| uint32
| uint64
applyaddnoise
— apply noise addition
true
(default) | false
apply gaussian white noise addition, specified as true
or
false
.
dependencies
to enable this property, set augmentationparametersource to 'specify'
.
data types: logical
snr
— noise addition snr (db)
5
(default) | scalar | vector
noise addition snr in db, specified as a scalar or vector.
data types: single
| double
| int8
| int16
| int32
| int64
| uint8
| uint16
| uint32
| uint64
shift time
timeshiftprobability
— probability of applying time shift
0.5
(default) | scalar in the range [0, 1]
probability of applying time shift, specified as a scalar in the range [0, 1]. set
the probability to 1
to apply time shifting every time you call
. set
the property to 0
to skip time shifting every time you call
.
time-shifting applies a circular shift on the time-domain audio data.
dependencies
to enable this property, set augmentationparametersource to 'random'
and
augmentationmode to 'sequential'
.
data types: single
| double
timeshiftrange
— range of time shift (s)
[-5e-3,5e3]
(default) | two-element row vector of nondecreasing values.
range of time shift in seconds, specified as a two-element row vector of nondecreasing values.
dependencies
to enable this property, set augmentationparametersource to 'random'
.
data types: single
| double
| int8
| int16
| int32
| int64
| uint8
| uint16
| uint32
| uint64
applytimeshift
— apply time shift
true
(default) | false
apply time shift, specified as true
or
false
.
dependencies
to enable this property, set augmentationparametersource to 'specify'
.
time-shifting applies a circular shift on the time-domain audio data.
data types: logical
timeshift
— time shift (s)
5e-3
(default) | scalar | vector
time shift in seconds, specified as a scalar or vector.
dependencies
to enable this property, set augmentationparametersource to 'specify'
.
data types: single
| double
| int8
| int16
| int32
| int64
| uint8
| uint16
| uint32
| uint64
object functions
add custom augmentation method | |
remove custom augmentation method | |
augment audio data | |
set parameters of augmentation algorithm | |
get parameters of augmentation algorithm |
examples
apply random sequential augmentations
read in an audio signal and listen to it.
[audioin,fs] = audioread("counting-16-44p1-mono-15secs.wav");
sound(audioin,fs)
create an audiodataaugmenter
object that applies time stretching, volume control, and time shifting in cascade. apply each of the augmentations with 80% probability. set numaugmentations
to 5
to output five independently augmented signals. to skip pitch shifting and noise addition for each augmentation, set the respective probabilities to 0
. define parameter ranges for each relevant augmentation algorithm.
augmenter = audiodataaugmenter( ... "augmentationmode","sequential", ... "numaugmentations",5, ... ... "timestretchprobability",0.8, ... "speedupfactorrange", [1.3,1.4], ... ... "pitchshiftprobability",0, ... ... "volumecontrolprobability",0.8, ... "volumegainrange",[-5,5], ... ... "addnoiseprobability",0, ... ... "timeshiftprobability",0.8, ... "timeshiftrange", [-500e-3,500e-3])
augmenter = audiodataaugmenter with properties: augmentationmode: "sequential" augmentationparametersource: 'random' numaugmentations: 5 timestretchprobability: 0.8000 speedupfactorrange: [1.3000 1.4000] pitchshiftprobability: 0 volumecontrolprobability: 0.8000 volumegainrange: [-5 5] addnoiseprobability: 0 timeshiftprobability: 0.8000 timeshiftrange: [-0.5000 0.5000]
call augment
on the audio to create 5 augmentations. the augmented audio is returned in a table with variables audio
and augmentationinfo
. the number of rows in the table is defined by numaugmentations
.
data = augment(augmenter,audioin,fs)
data=5×2 table
audio augmentationinfo
_________________ ________________
{685056x1 double} 1x1 struct
{685056x1 double} 1x1 struct
{505183x1 double} 1x1 struct
{685056x1 double} 1x1 struct
{490728x1 double} 1x1 struct
in the current augmentation pipeline, augmentation parameters are assigned randomly from within the specified ranges. to determine the exact parameters used for an augmentation, inspect augmentationinfo
.
augmentationtoinspect = 4;
data.augmentationinfo(augmentationtoinspect)
ans = struct with fields:
speedupfactor: 1
volumegain: 4.3399
timeshift: 0.4502
listen to the augmentation you are inspecting. plot time representation of the original and augmented signals.
augmentation = data.audio{augmentationtoinspect}; sound(augmentation,fs) t = (0:(numel(audioin)-1))/fs; taug = (0:(numel(augmentation)-1))/fs; plot(t,audioin,taug,augmentation) legend("original audio","augmented audio") ylabel("amplitude") xlabel("time (s)")
apply specified sequential augmentations
read in an audio signal and listen to it.
[audioin,fs] = audioread("counting-16-44p1-mono-15secs.wav");
sound(audioin,fs)
create an audiodataaugmenter
object that applies time stretching, pitch shifting, and noise corruption in cascade. specify the time stretch speedup factors as 0.9
, 1.1
, and 1.2
. specify the pitch shifting in semitones as -2
, -1
, 1
, and 2
. specify the noise corruption snr as 10
db and 15
db.
augmenter = audiodataaugmenter( ... "augmentationmode","sequential", ... "augmentationparametersource","specify", ... "speedupfactor",[0.9,1.1,1.2], ... "applytimestretch",true, ... "applypitchshift",true, ... "semitoneshift",[-2,-1,1,2], ... "snr",[10,15], ... "applyvolumecontrol",false, ... "applytimeshift",false)
augmenter = audiodataaugmenter with properties: augmentationmode: "sequential" augmentationparametersource: "specify" applytimestretch: 1 speedupfactor: [0.9000 1.1000 1.2000] applypitchshift: 1 semitoneshift: [-2 -1 1 2] applyvolumecontrol: 0 applyaddnoise: 1 snr: [10 15] applytimeshift: 0
call augment
on the audio to create 24 augmentations. the augmentations represent every combination of the specified augmentation parameters ().
data = augment(augmenter,audioin,fs)
data=24×2 table
audio augmentationinfo
_________________ ________________
{761243x1 double} 1x1 struct
{622888x1 double} 1x1 struct
{571263x1 double} 1x1 struct
{761243x1 double} 1x1 struct
{622888x1 double} 1x1 struct
{571263x1 double} 1x1 struct
{761243x1 double} 1x1 struct
{622888x1 double} 1x1 struct
{571263x1 double} 1x1 struct
{761243x1 double} 1x1 struct
{622888x1 double} 1x1 struct
{571263x1 double} 1x1 struct
{761243x1 double} 1x1 struct
{622888x1 double} 1x1 struct
{571263x1 double} 1x1 struct
{761243x1 double} 1x1 struct
⋮
you can check the parameter configuration of each augmentation using the augmentationinfo
table variable.
augmentationtoinspect = 1;
data.augmentationinfo(augmentationtoinspect)
ans = struct with fields:
speedupfactor: 0.9000
semitoneshift: -2
snr: 10
listen to the augmentation you are inspecting. plot the time-domain representation of the original and augmented signals.
augmentation = data.audio{augmentationtoinspect}; sound(augmentation,fs) t = (0:(numel(audioin)-1))/fs; taug = (0:(numel(augmentation)-1))/fs; plot(t,audioin,taug,augmentation) legend("original audio","augmented audio") ylabel("amplitude") xlabel("time (s)")
apply random independent augmentations
read in an audio signal and listen to it.
[audioin,fs] = audioread("counting-16-44p1-mono-15secs.wav");
create an audiodataaugmenter
object that applies noise corruption, and time shifting in parallel branches. for the noise corruption branch, randomly apply noise with an snr in the range 0
db to 20
db. for the time shifting branch, randomly apply time shifting in the range -300
ms to 300
ms. apply augmentation 2 times for each branch, for 4 total augmentations.
augmenter = audiodataaugmenter( ... "augmentationmode","independent", ... "augmentationparametersource","random", ... "numaugmentations",2, ... "applytimestretch",false, ... "applypitchshift",false, ... "applyvolumecontrol",false, ... "snrrange",[0,20], ... "timeshiftrange",[-300e-3,300e-3])
augmenter = audiodataaugmenter with properties: augmentationmode: "independent" augmentationparametersource: "random" numaugmentations: 2 applytimestretch: 0 applypitchshift: 0 applyvolumecontrol: 0 applyaddnoise: 1 snrrange: [0 20] applytimeshift: 1 timeshiftrange: [-0.3000 0.3000]
call augment
on the audio to create 3 augmentations.
data = augment(augmenter,audioin,fs);
you can check the parameter configuration of each augmentation using the augmentatioinfo
table variable.
augmentationtoinspect = 4;
data.augmentationinfo{augmentationtoinspect}
ans = struct with fields:
timeshift: 0.0016
listen to the audio you are inspecting. plot the time-domain representation of the original and augmented signals.
augmentation = data.audio{augmentationtoinspect}; sound(augmentation,fs) t = (0:(numel(audioin)-1))/fs; taug = (0:(numel(augmentation)-1))/fs; plot(t,audioin,taug,augmentation) legend("original audio","augmented audio") ylabel("amplitude") xlabel("time (s)")
apply specified independent augmentations
read in an audio signal and listen to it.
[audioin,fs] = audioread("counting-16-44p1-mono-15secs.wav");
create an audiodataaugmenter
object that applies volume control, noise corruption, and time shifting in parallel branches.
augmenter = audiodataaugmenter( ... "augmentationmode","independent", ... "augmentationparametersource","specify", ... "applytimestretch",false, ... "applypitchshift",false, ... "volumegain",2, ... "snr",0, ... "timeshift",2)
augmenter = audiodataaugmenter with properties: augmentationmode: "independent" augmentationparametersource: "specify" applytimestretch: 0 applypitchshift: 0 applyvolumecontrol: 1 volumegain: 2 applyaddnoise: 1 snr: 0 applytimeshift: 1 timeshift: 2
call augment
on the audio to create 3 augmentations.
data = augment(augmenter,audioin,fs)
data=3×2 table
audio augmentationinfo
_________________ ________________
{685056x1 double} {1x1 struct}
{685056x1 double} {1x1 struct}
{685056x1 double} {1x1 struct}
you can check the parameter configuration of each augmentation using the augmentatioinfo
table variable.
augmentationtoinspect = 3;
data.augmentationinfo{augmentationtoinspect}
ans = struct with fields:
timeshift: 2
listen to the audio you are inspecting. plot the time-domain representations of the original and augmented signals.
augmentation = data.audio{augmentationtoinspect}; sound(augmentation,fs) t = (0:(numel(audioin)-1))/fs; taug = (0:(numel(augmentation)-1))/fs; plot(t,audioin,taug,augmentation) legend("original audio","augmented audio") ylabel("amplitude") xlabel("time (s)")
augment audio dataset
the audiodataaugmenter
supports multiple workflows for augmenting your datastore, including:
offline augmentation
augmentation using tall arrays
augmentation using transform datastores
in each workflow, begin by creating an audio datastore to point to your audio data. in this example, you create an audio datastore that points to audio samples included with audio toolbox™. count the number of files in the dataset.
folder = fullfile(matlabroot,"toolbox","audio","samples"); ads = audiodatastore(folder)
ads = audiodatastore with properties: files: { ' ...\matlab\toolbox\audio\samples\ambiance-16-44p1-mono-12secs.wav'; ' ...\matlab\toolbox\audio\samples\audioarray-16-16-4channels-20secs.wav'; ' ...\toolbox\audio\samples\churchimpulseresponse-16-44p1-mono-5secs.wav' ... and 26 more } alternatefilesystemroots: {} outputdatatype: 'double' labels: {}
numfilesindataset = numel(ads.files)
numfilesindataset = 29
create an audiodataaugmenter
that applies random sequential augmentations. set numaugmentations
to 2
.
aug = audiodataaugmenter('numaugmentations',2)
aug = audiodataaugmenter with properties: augmentationmode: 'sequential' augmentationparametersource: 'random' numaugmentations: 2 timestretchprobability: 0.5000 speedupfactorrange: [0.8000 1.2000] pitchshiftprobability: 0.5000 semitoneshiftrange: [-2 2] volumecontrolprobability: 0.5000 volumegainrange: [-3 3] addnoiseprobability: 0.5000 snrrange: [0 10] timeshiftprobability: 0.5000 timeshiftrange: [-0.0050 0.0050]
offline augmentation
to augment the audio dataset, create two augmentations of each file and then write the augmentations as wav files.
while hasdata(ads) [audioin,info] = read(ads); data = augment(aug,audioin,info.samplerate); [~,fn] = fileparts(info.filename); for i = 1:size(data,1) augmentedaudio = data.audio{i}; % if augmentation caused an audio signal to have values outside of -1 and 1, % normalize the audio signal to avoid clipping when writing. if max(abs(augmentedaudio),[],'all')>1 augmentedaudio = augmentedaudio/max(abs(augmentedaudio),[],'all'); end audiowrite(sprintf('%s_aug%d.wav',fn,i),augmentedaudio,info.samplerate) end end
create an audiodatastore
that points to the augmented dataset and confirm that the number of files in the dataset is double the original number of files.
augmentedads = audiodatastore(pwd)
augmentedads = audiodatastore with properties: files: { ' ...\examples\audio-ex28074079\ambiance-16-44p1-mono-12secs_aug1.wav'; ' ...\examples\audio-ex28074079\ambiance-16-44p1-mono-12secs_aug2.wav'; ' ...\examples\audio-ex28074079\audioarray-16-16-4channels-20secs_aug1.wav' ... and 55 more } alternatefilesystemroots: {} outputdatatype: 'double' labels: {}
numfilesinaugmenteddataset = numel(augmentedads.files)
numfilesinaugmenteddataset = 58
augment using tall arrays
when augmenting a dataset using tall arrays, the input data to the augmenter should be sampled at a consistent rate. subset the original audio dataset to only include files with a sample rate of 44.1 khz. most datasets are already cleaned to have a consistent sample rate.
keepfile = cellfun(@(x)contains(x,'44p1'),ads.files);
ads44p1 = subset(ads,keepfile);
fs = 44.1e3;
convert the audio datastore to a tall array. tall
arrays are evaluated only when you request them explicitly using gather
. matlab® automatically optimizes the queued calculations by minimizing the number of passes through the data. if you have the parallel computing toolbox™, you can spread the calculations across multiple machines. the audio data is represented as an m-by-1 tall cell array, where m is the number of files in the audio datastore.
adstall = tall(ads44p1)
starting parallel pool (parpool) using the 'local' profile ... connected to the parallel pool (number of workers: 6). adstall = m×1 tall cell array { 539648×1 double} { 227497×1 double} { 8000×1 double} { 685056×1 double} { 882688×2 double} {1115760×2 double} { 505200×2 double} {3195904×2 double} : : : :
define a cellfun
function so that augmentation is applied to each cell of the tall array. call gather
to evaluate the tall array.
augtall = cellfun(@(x)augment(aug,x,fs),adstall,"uniformoutput",false);
augmenteddataset = gather(augtall)
evaluating tall expression using the parallel pool 'local': - pass 1 of 1: completed in 1 min 34 sec evaluation completed in 1 min 34 sec
augmenteddataset=12×1 cell array
{2×2 table}
{2×2 table}
{2×2 table}
{2×2 table}
{2×2 table}
{2×2 table}
{2×2 table}
{2×2 table}
{2×2 table}
{2×2 table}
{2×2 table}
{2×2 table}
the augmented dataset is returned as a numfiles-by-1 cell array, where numfiles is the number of files in the datastore. each element of the cell array is a numaugmentationsperfile-by-2 table, where numaugmentationsperfile is the number of augmentations returned per file.
numfiles = numel(augmenteddataset)
numfiles = 12
numaugmentationsperfile = size(augmenteddataset{1},1)
numaugmentationsperfile = 2
augment using transform datastore
you can perform online data augmentation while you train your machine learning application using a transform datastore. call transform
to create a new datastore that applies data augmentation while reading.
transformads = transform(ads,@(x,info)augment(aug,x,info),'includeinfo',true)
transformads = transformeddatastore with properties: underlyingdatastore: [1×1 audiodatastore] transforms: {@(x,info)augment(aug,x,info)} includeinfo: 1
call read
to return the augmented first file from the transform datastore.
augmentedread = read(transformads)
augmentedread=2×2 table
audio augmentationinfo
_________________ ________________
{539648×1 double} [1×1 struct]
{586683×1 double} [1×1 struct]
add custom augmentation method
you can expand the capabilities of audiodataaugmenter
by adding custom augmentation methods.
read in an audio signal and listen to it.
[audioin,fs] = audioread('counting-16-44p1-mono-15secs.wav');
sound(audioin,fs)
create an audiodataaugmenter
object. set the probability of applying white noise to 0
.
augmenter = audiodataaugmenter('addnoiseprobability',0)
augmenter = audiodataaugmenter with properties: augmentationmode: 'sequential' augmentationparametersource: 'random' numaugmentations: 1 timestretchprobability: 0.5000 speedupfactorrange: [0.8000 1.2000] pitchshiftprobability: 0.5000 semitoneshiftrange: [-2 2] volumecontrolprobability: 0.5000 volumegainrange: [-3 3] addnoiseprobability: 0 timeshiftprobability: 0.5000 timeshiftrange: [-0.0050 0.0050]
specify a custom augmentation algorithm that applies pink noise. the addpinknoise
algorithm is added to the augmenter
properties.
algorithmname = 'addpinknoise'; algorithmhandle = @(x)x pinknoise(size(x),'like',x); addaugmentationmethod(augmenter,algorithmname,algorithmhandle) augmenter
augmenter = audiodataaugmenter with properties: augmentationmode: 'sequential' augmentationparametersource: 'random' numaugmentations: 1 timestretchprobability: 0.5000 speedupfactorrange: [0.8000 1.2000] pitchshiftprobability: 0.5000 semitoneshiftrange: [-2 2] volumecontrolprobability: 0.5000 volumegainrange: [-3 3] addnoiseprobability: 0 timeshiftprobability: 0.5000 timeshiftrange: [-0.0050 0.0050] addpinknoiseprobability: 0.5000
set the probability of adding pink noise to 1
.
augmenter.addpinknoiseprobability = 1
augmenter = audiodataaugmenter with properties: augmentationmode: 'sequential' augmentationparametersource: 'random' numaugmentations: 1 timestretchprobability: 0.5000 speedupfactorrange: [0.8000 1.2000] pitchshiftprobability: 0.5000 semitoneshiftrange: [-2 2] volumecontrolprobability: 0.5000 volumegainrange: [-3 3] addnoiseprobability: 0 timeshiftprobability: 0.5000 timeshiftrange: [-0.0050 0.0050] addpinknoiseprobability: 1
augment the original signal and listen to the result. inspect parameters of the augmentation algorithms applied.
data = augment(augmenter,audioin,fs); sound(data.audio{1},fs) data.augmentationinfo(1)
ans = struct with fields:
speedupfactor: 1
semitoneshift: 0
volumegain: 2.4803
timeshift: -0.0022
addpinknoise: 'applied'
plot the mel spectrograms of the original and augmented signals.
melspectrogram(audioin,fs)
title('original signal')
melspectrogram(data.audio{1},fs)
title('augmented signal')
algorithms
the audiodataaugmenter
object enables you to configure your augmentation
pipeline as deterministic or probabilistic using the augmentationparametersource property. you can also choose to apply the
augmentations in series or in parallel using the augmentationmode
property. the following sections describe the pipelines you can create and the applicable
properties for each architecture.
random sequential augmentations
to define your augmentation as a sequence of probabilistically applied augmentations,
set augmentationparametersource to 'random'
and augmentationmode
to 'sequential'
.
the order that augmentations are applied is always the same. if you specify custom algorithms, they are applied at the end of the sequence, in the order you specified them.
in this pipeline configuration, these parameters apply:
augmentation method | parameters |
---|---|
stretch time | |
shift pitch | |
control volume | |
add noise | |
shift time |
if you specify numaugmentations
as greater than 1, then the object applies numaugmentations
parallel
random sequential augmentations. the probability of applying an augmentation, and the value
of any parameters that are probabilistically determined, are independent.
specified sequential augmentations
to define your augmentation as a sequence of deterministically applied augmentations,
set augmentationparametersource to 'specify'
and augmentationmode
to 'sequential'
.
the order that augmentations are applied is always the same. if you specify custom algorithms, they are applied at the end of the sequence, in the order you specified them.
in this pipeline configuration, these parameters apply:
augmentation method | parameters |
---|---|
stretch time | |
shift pitch | |
control volume | |
add noise | |
shift time |
if you specify an augmentation method as a vector, then each element of the vector creates a separate branch in the augmentation pipeline. for example, the following object creates an augmentation pipeline that results in four separate augmentations:
aug = audiodataaugmenter("augmentationmode","sequential", ... "augmentationparametersource","specify", ... "speedupfactor",[0.8,1.2], ... "volumegain",[-3,-1])
aug = audiodataaugmenter with properties: augmentationmode: "sequential" augmentationparametersource: "specify" applytimestretch: 1 speedupfactor: [0.8000 1.2000] applypitchshift: 1 semitoneshift: -3 applyvolumecontrol: 1 volumegain: [-3 -1] applyaddnoise: 1 snr: 5 applytimeshift: 1 timeshift: 0.0050
random independent augmentations
to define your augmentation as independently applied augmentations with randomly
determined parameters, set augmentationparametersource to 'random'
and augmentationmode
to 'independent'
.
in this pipeline configuration, these parameters apply:
augmentation method | parameters |
---|---|
stretch time | |
shift pitch | |
control volume | |
add noise | |
shift time |
if you specify numaugmentations
as greater than 1, then the object applies numaugmentations
parallel
random independent augmentations. the value of any parameters that are probabilistically
determined are independent.
specified independent augmentations
to define your augmentation as deterministically applied independent augmentations with
deterministic parameters, set augmentationparametersource to 'specify'
and augmentationmode
to 'independent'
.
in this pipeline configuration, these parameters apply:
augmentation method | parameters |
---|---|
stretch time | |
shift pitch | |
control volume | |
add noise | |
shift time |
if you specify an augmentation method as a vector, then each element of the vector creates a separate branch in the augmentation pipeline. for example, the following object creates an augmentation pipeline that results in seven separate augmentations:
aug = audiodataaugmenter("augmentationmode","independent", ... "augmentationparametersource","specify", ... "speedupfactor",[0.8,1.2], ... "volumegain",[-3,-1])
aug = audiodataaugmenter with properties: augmentationmode: "independent" augmentationparametersource: "specify" applytimestretch: 1 speedupfactor: [0.8000 1.2000] applypitchshift: 1 semitoneshift: -3 applyvolumecontrol: 1 volumegain: [-3 -1] applyaddnoise: 1 snr: 5 applytimeshift: 1 timeshift: 0.0050
references
[1] salamon, justin, and juan pablo bello. "deep convolutional neural networks and data augmentation for environmental sound classification." ieee signal processing letters. vol. 24, issue 3, 2017.
extended capabilities
gpu arrays
accelerate code by running on a graphics processing unit (gpu) using parallel computing toolbox™.
usage notes and limitations:
lockphase
must be set tofalse
for the time stretching and pitch shifting augmentations. for more information, see .using
gpuarray
(parallel computing toolbox) input withaudiodataaugmenter
is only recommended for a gpu with compute capability 7.0 ("volta") or above. other hardware might not offer any performance advantage. to check your gpu compute capability, seecomputecompability
in the output from thegpudevice
(parallel computing toolbox) function. for more information, see gpu computing requirements (parallel computing toolbox).
for an overview of gpu usage in matlab®, see run matlab functions on a gpu (parallel computing toolbox).
version history
introduced in r2019b
see also
打开示例
您曾对此示例进行过修改。是否要打开带有您的编辑的示例?
matlab 命令
您点击的链接对应于以下 matlab 命令:
请在 matlab 命令行窗口中直接输入以执行命令。web 浏览器不支持 matlab 命令。
select a web site
choose a web site to get translated content where available and see local events and offers. based on your location, we recommend that you select: .
you can also select a web site from the following list:
how to get best site performance
select the china site (in chinese or english) for best site performance. other mathworks country sites are not optimized for visits from your location.
americas
- (español)
- (english)
- (english)
europe
- (english)
- (english)
- (deutsch)
- (español)
- (english)
- (français)
- (english)
- (italiano)
- (english)
- (english)
- (english)
- (deutsch)
- (english)
- (english)
- switzerland
- (english)
asia pacific
- (english)
- (english)
- (english)
- 中国
- (日本語)
- (한국어)