augment audio data

since r2019b

description

enlarge your audio dataset using audio-specific augmentation techniques like pitch shifting, time-scale modification, time shifting, noise addition, and volume control. you can create cascaded or parallel augmentation pipelines to apply multiple algorithms deterministically or probabilistically.

creation

syntax

aug = audiodataaugmenter()

aug = audiodataaugmenter(name,value)

description

aug = audiodataaugmenter() creates an audio data augmenter object with default property values.

example

aug = audiodataaugmenter(name,value) specifies nondefault properties for aug using one or more name-value arguments.

properties

augmentation pipeline

`augmentationmode` — augmentation mode
`'sequential'` (default) | `'independent'`

augmentation mode, specified as 'sequential' or 'independent'.

'sequential' –– augmentation algorithms are applied sequentially (in series).
'independent' –– augmentation algorithms are applied independently (in parallel).

data types: char | string

`augmentationparametersource` — source of augmentation parameters
`'random'` (default) | `'specify'`

source of augmentation parameters, specified as 'random' or 'specify'.

'random' –– augmentation algorithms are applied probabilistically using a probability parameter and a range parameter.
for example, to create an audiodataaugmenter that applies time-stretching using a speedup factor between 0.5 and 1.5 with a 60% probability, enter the following in the command window:
```
aug = audiodataaugmenter('augmentationparametersource','random', ...
                         'timestretchprobability',0.6, ...
                         'speedupfactorrange',[0.5,1.5]);
```
when time-stretching is applied, the speedup factor is drawn from a uniform distribution centered at 1 (the mean of the range) with a minimum of 0.5 and a maximum of 1.5.
'specify' –– augmentation algorithms are applied deterministically using a logical parameter and a specified parameter value. for example, to create an audiodataaugmenter that applies time-stretching using a 1.5 speedup factor with a 100% probability, enter the following in the command window:
```
aug = audiodataaugmenter('augmentationparametersource','specify', ...
                         'applytimestretch',true, ...
                         'speedupfactor',1.5);
```

data types: char | string

`numaugmentations` — number of augmented signals to output
`1` (default) | positive integer

number of augmented signals to output, specified as a positive integer.

dependencies

to enable this property, set augmentationparametersource to 'random'.

stretch time

`timestretchprobability` — probability of applying time stretch
`0.5` (default) | scalar in the range [0, 1]

probability of applying time stretch, specified as a scalar in the range [0, 1]. set the probability to 1 to apply time stretching every time you call . set the probability to 0 to skip time stretching every time you call .

dependencies

to enable this property, set augmentationparametersource to 'random' and augmentationmode to 'sequential'.

data types: single | double

`speedupfactorrange` — range of time stretch speedup factor
`[0.8 1.2]` (default) | two-element row vector of positive nondecreasing values

range of time stretch speedup factor, specified as a two-element row vector of positive nondecreasing values.

dependencies

to enable this property, set augmentationparametersource to 'random'.

`applytimestretch` — apply time stretch
`true` (default) | `false`

apply time stretch, specified as true or false.

dependencies

to enable this property, set augmentationparametersource to 'specify'.

data types: logical

`speedupfactor` — time stretch speedup factor
`0.8` (default) | real positive scalar | real positive vector

time stretch speedup factor, specified as a scalar or vector of real positive values.

dependencies

to enable this property, set augmentationparametersource to 'specify'.

shift pitch

`pitchshiftprobability` — probability of applying pitch shift
`0.5` (default) | scalar in the range [0, 1]

probability of applying pitch shift, specified as a scalar in the range [0, 1]. set the probability to 1 to apply pitch shifting every time you call . set the probability to 0 to skip pitch shifting every time you call .

dependencies

to enable this property, set augmentationparametersource to 'random' and augmentationmode to 'sequential'.

data types: single | double

`semitoneshiftrange` — range of pitch shift (semitones)
`[-2,2]` (default) | two-element row vector of nondecreasing values

range of pitch shift in semitones, specified as a two-element row vector of nondecreasing values.

dependencies

to enable this property, set augmentationparametersource to 'random'.

`applypitchshift` — apply pitch shift
`true` (default) | `false`

apply pitch shift, specified as true or false.

dependencies

to enable this property, set augmentationparametersource to 'specify'.

data types: logical

`semitoneshift` — pitch shift (semitones)
`-3` (default) | real scalar | real vector

pitch shift in semitones, specified as a real scalar or vector.

dependencies

to enable this property, set augmentationparametersource to 'specify'.

control volume

`volumecontrolprobability` — probability of applying volume control
`0.5` (default) | scalar in the range [0, 1]

probability of applying volume control, specified as a scalar in the range [0, 1]. set the probability to 1 to apply volume control every time you call . set the probability to 0 to skip volume control every time you call .

dependencies

to enable this property, set augmentationparametersource to 'random' and augmentationmode to 'sequential'.

data types: single | double

`volumegainrange` — range of volume gain (db)
`[-3,3]` (default) | two-element row vector of nondecreasing values

range of volume gain in db, specified as a two-element row vector of nondecreasing values.

dependencies

to enable this property, set augmentationparametersource to 'random'.

`applyvolumecontrol` — apply volume gain
`true` (default) | `false`

apply volume gain, specified as true or false.

dependencies

to enable this property, set augmentationparametersource to 'specify'.

data types: logical

`volumegain` — volume gain (db)
`-3` (default) | scalar | vector

volume gain in db, specified as a scalar or vector.

add noise

`addnoiseprobability` — probability of applying noise addition
`0.5` (default) | scalar in the range [0, 1]

probability of applying gaussian white noise addition, specified as a scalar in the range [0, 1]. set the probability to 1 to add noise every time you call . set the probability to 0 to skip adding noise every time you call .

dependencies

to enable this property, set augmentationparametersource to 'random' and augmentationmode to 'sequential'.

data types: single | double

`snrrange` — range of noise addition snr (db)
`[0,10]` (default) | two-element row vector of nondecreasing values

range of noise addition snr in db, specified as a two-element row vector of nondecreasing values.

dependencies

to enable this property, set augmentationparametersource to 'range'.

`applyaddnoise` — apply noise addition
`true` (default) | `false`

apply gaussian white noise addition, specified as true or false.

dependencies

to enable this property, set augmentationparametersource to 'specify'.

data types: logical

`snr` — noise addition snr (db)
`5` (default) | scalar | vector

noise addition snr in db, specified as a scalar or vector.

shift time

`timeshiftprobability` — probability of applying time shift
`0.5` (default) | scalar in the range [0, 1]

probability of applying time shift, specified as a scalar in the range [0, 1]. set the probability to 1 to apply time shifting every time you call . set the property to 0 to skip time shifting every time you call .

time-shifting applies a circular shift on the time-domain audio data.

dependencies

to enable this property, set augmentationparametersource to 'random' and augmentationmode to 'sequential'.

data types: single | double

`timeshiftrange` — range of time shift (s)
`[-5e-3,5e3]` (default) | two-element row vector of nondecreasing values.

range of time shift in seconds, specified as a two-element row vector of nondecreasing values.

dependencies

to enable this property, set augmentationparametersource to 'random'.

`applytimeshift` — apply time shift
`true` (default) | `false`

apply time shift, specified as true or false.

dependencies

to enable this property, set augmentationparametersource to 'specify'.

time-shifting applies a circular shift on the time-domain audio data.

data types: logical

`timeshift` — time shift (s)
`5e-3` (default) | scalar | vector

time shift in seconds, specified as a scalar or vector.

dependencies

to enable this property, set augmentationparametersource to 'specify'.

object functions

	add custom augmentation method
	remove custom augmentation method
	augment audio data
	set parameters of augmentation algorithm
	get parameters of augmentation algorithm

examples

apply random sequential augmentations

read in an audio signal and listen to it.

[audioin,fs] = audioread("counting-16-44p1-mono-15secs.wav");
sound(audioin,fs)

create an audiodataaugmenter object that applies time stretching, volume control, and time shifting in cascade. apply each of the augmentations with 80% probability. set numaugmentations to 5 to output five independently augmented signals. to skip pitch shifting and noise addition for each augmentation, set the respective probabilities to 0. define parameter ranges for each relevant augmentation algorithm.

augmenter = audiodataaugmenter( ...
    "augmentationmode","sequential", ...
    "numaugmentations",5, ...
    ...
    "timestretchprobability",0.8, ...
    "speedupfactorrange", [1.3,1.4], ...
    ...
    "pitchshiftprobability",0, ...
    ...
    "volumecontrolprobability",0.8, ...
    "volumegainrange",[-5,5], ...
    ...
    "addnoiseprobability",0, ...
    ...
    "timeshiftprobability",0.8, ...
    "timeshiftrange", [-500e-3,500e-3])

augmenter = 
  audiodataaugmenter with properties:
               augmentationmode: "sequential"
    augmentationparametersource: 'random'
               numaugmentations: 5
         timestretchprobability: 0.8000
             speedupfactorrange: [1.3000 1.4000]
          pitchshiftprobability: 0
       volumecontrolprobability: 0.8000
                volumegainrange: [-5 5]
            addnoiseprobability: 0
           timeshiftprobability: 0.8000
                 timeshiftrange: [-0.5000 0.5000]

call augment on the audio to create 5 augmentations. the augmented audio is returned in a table with variables audio and augmentationinfo. the number of rows in the table is defined by numaugmentations.

data = augment(augmenter,audioin,fs)

data=5×2 table
          audio          augmentationinfo
    _________________    ________________
    {685056x1 double}       1x1 struct   
    {685056x1 double}       1x1 struct   
    {505183x1 double}       1x1 struct   
    {685056x1 double}       1x1 struct   
    {490728x1 double}       1x1 struct

in the current augmentation pipeline, augmentation parameters are assigned randomly from within the specified ranges. to determine the exact parameters used for an augmentation, inspect augmentationinfo.

augmentationtoinspect = 4;
data.augmentationinfo(augmentationtoinspect)

ans = struct with fields:
    speedupfactor: 1
       volumegain: 4.3399
        timeshift: 0.4502

listen to the augmentation you are inspecting. plot time representation of the original and augmented signals.

augmentation = data.audio{augmentationtoinspect};
sound(augmentation,fs)
t = (0:(numel(audioin)-1))/fs;
taug = (0:(numel(augmentation)-1))/fs;
plot(t,audioin,taug,augmentation)
legend("original audio","augmented audio")
ylabel("amplitude")
xlabel("time (s)")

figure contains an axes object. the axes object with xlabel time (s), ylabel amplitude contains 2 objects of type line. these objects represent original audio, augmented audio.

apply specified sequential augmentations

read in an audio signal and listen to it.

[audioin,fs] = audioread("counting-16-44p1-mono-15secs.wav");
sound(audioin,fs)

create an audiodataaugmenter object that applies time stretching, pitch shifting, and noise corruption in cascade. specify the time stretch speedup factors as 0.9, 1.1, and 1.2. specify the pitch shifting in semitones as -2, -1, 1, and 2. specify the noise corruption snr as 10 db and 15 db.

augmenter = audiodataaugmenter( ...
    "augmentationmode","sequential", ...
    "augmentationparametersource","specify", ...
    "speedupfactor",[0.9,1.1,1.2], ...
    "applytimestretch",true, ...
    "applypitchshift",true, ...
    "semitoneshift",[-2,-1,1,2], ...
    "snr",[10,15], ...
    "applyvolumecontrol",false, ...
    "applytimeshift",false)

augmenter = 
  audiodataaugmenter with properties:
               augmentationmode: "sequential"
    augmentationparametersource: "specify"
               applytimestretch: 1
                  speedupfactor: [0.9000 1.1000 1.2000]
                applypitchshift: 1
                  semitoneshift: [-2 -1 1 2]
             applyvolumecontrol: 0
                  applyaddnoise: 1
                            snr: [10 15]
                 applytimeshift: 0

call augment on the audio to create 24 augmentations. the augmentations represent every combination of the specified augmentation parameters ( $3 \times 4 \times 2 = 24$ ).

data = augment(augmenter,audioin,fs)

data=24×2 table
          audio          augmentationinfo
    _________________    ________________
    {761243x1 double}       1x1 struct   
    {622888x1 double}       1x1 struct   
    {571263x1 double}       1x1 struct   
    {761243x1 double}       1x1 struct   
    {622888x1 double}       1x1 struct   
    {571263x1 double}       1x1 struct   
    {761243x1 double}       1x1 struct   
    {622888x1 double}       1x1 struct   
    {571263x1 double}       1x1 struct   
    {761243x1 double}       1x1 struct   
    {622888x1 double}       1x1 struct   
    {571263x1 double}       1x1 struct   
    {761243x1 double}       1x1 struct   
    {622888x1 double}       1x1 struct   
    {571263x1 double}       1x1 struct   
    {761243x1 double}       1x1 struct   
      ⋮

you can check the parameter configuration of each augmentation using the augmentationinfo table variable.

augmentationtoinspect = 1;
data.augmentationinfo(augmentationtoinspect)

ans = struct with fields:
    speedupfactor: 0.9000
    semitoneshift: -2
              snr: 10

listen to the augmentation you are inspecting. plot the time-domain representation of the original and augmented signals.

augmentation = data.audio{augmentationtoinspect};
sound(augmentation,fs)
t = (0:(numel(audioin)-1))/fs;
taug = (0:(numel(augmentation)-1))/fs;
plot(t,audioin,taug,augmentation)
legend("original audio","augmented audio")
ylabel("amplitude")
xlabel("time (s)")

figure contains an axes object. the axes object with xlabel time (s), ylabel amplitude contains 2 objects of type line. these objects represent original audio, augmented audio.

apply random independent augmentations

read in an audio signal and listen to it.

[audioin,fs] = audioread("counting-16-44p1-mono-15secs.wav");

create an audiodataaugmenter object that applies noise corruption, and time shifting in parallel branches. for the noise corruption branch, randomly apply noise with an snr in the range 0 db to 20 db. for the time shifting branch, randomly apply time shifting in the range -300 ms to 300 ms. apply augmentation 2 times for each branch, for 4 total augmentations.

augmenter = audiodataaugmenter( ...
    "augmentationmode","independent", ...
    "augmentationparametersource","random", ...
    "numaugmentations",2, ...
    "applytimestretch",false, ...
    "applypitchshift",false, ...
    "applyvolumecontrol",false, ...
    "snrrange",[0,20], ...
    "timeshiftrange",[-300e-3,300e-3])

augmenter = 
  audiodataaugmenter with properties:
               augmentationmode: "independent"
    augmentationparametersource: "random"
               numaugmentations: 2
               applytimestretch: 0
                applypitchshift: 0
             applyvolumecontrol: 0
                  applyaddnoise: 1
                       snrrange: [0 20]
                 applytimeshift: 1
                 timeshiftrange: [-0.3000 0.3000]

call augment on the audio to create 3 augmentations.

data = augment(augmenter,audioin,fs);

you can check the parameter configuration of each augmentation using the augmentatioinfo table variable.

augmentationtoinspect = 4;
data.augmentationinfo{augmentationtoinspect}

ans = struct with fields:
    timeshift: 0.0016

listen to the audio you are inspecting. plot the time-domain representation of the original and augmented signals.

augmentation = data.audio{augmentationtoinspect};
sound(augmentation,fs)
t = (0:(numel(audioin)-1))/fs;
taug = (0:(numel(augmentation)-1))/fs;
plot(t,audioin,taug,augmentation)
legend("original audio","augmented audio")
ylabel("amplitude")
xlabel("time (s)")

figure contains an axes object. the axes object with xlabel time (s), ylabel amplitude contains 2 objects of type line. these objects represent original audio, augmented audio.

apply specified independent augmentations

read in an audio signal and listen to it.

[audioin,fs] = audioread("counting-16-44p1-mono-15secs.wav");

create an audiodataaugmenter object that applies volume control, noise corruption, and time shifting in parallel branches.

augmenter = audiodataaugmenter( ...
    "augmentationmode","independent", ...
    "augmentationparametersource","specify", ...
    "applytimestretch",false, ...
    "applypitchshift",false, ...
    "volumegain",2, ...
    "snr",0, ...
    "timeshift",2)

augmenter = 
  audiodataaugmenter with properties:
               augmentationmode: "independent"
    augmentationparametersource: "specify"
               applytimestretch: 0
                applypitchshift: 0
             applyvolumecontrol: 1
                     volumegain: 2
                  applyaddnoise: 1
                            snr: 0
                 applytimeshift: 1
                      timeshift: 2

call augment on the audio to create 3 augmentations.

data = augment(augmenter,audioin,fs)

data=3×2 table
          audio          augmentationinfo
    _________________    ________________
    {685056x1 double}      {1x1 struct}  
    {685056x1 double}      {1x1 struct}  
    {685056x1 double}      {1x1 struct}

you can check the parameter configuration of each augmentation using the augmentatioinfo table variable.

augmentationtoinspect = 3;
data.augmentationinfo{augmentationtoinspect}

ans = struct with fields:
    timeshift: 2

listen to the audio you are inspecting. plot the time-domain representations of the original and augmented signals.

augmentation = data.audio{augmentationtoinspect};
sound(augmentation,fs)
t = (0:(numel(audioin)-1))/fs;
taug = (0:(numel(augmentation)-1))/fs;
plot(t,audioin,taug,augmentation)
legend("original audio","augmented audio")
ylabel("amplitude")
xlabel("time (s)")

figure contains an axes object. the axes object with xlabel time (s), ylabel amplitude contains 2 objects of type line. these objects represent original audio, augmented audio.

augment audio dataset

the audiodataaugmenter supports multiple workflows for augmenting your datastore, including:

offline augmentation
augmentation using tall arrays
augmentation using transform datastores

in each workflow, begin by creating an audio datastore to point to your audio data. in this example, you create an audio datastore that points to audio samples included with audio toolbox™. count the number of files in the dataset.

folder = fullfile(matlabroot,"toolbox","audio","samples");
ads = audiodatastore(folder)

ads = 
  audiodatastore with properties:
                       files: {
                              ' ...\matlab\toolbox\audio\samples\ambiance-16-44p1-mono-12secs.wav';
                              ' ...\matlab\toolbox\audio\samples\audioarray-16-16-4channels-20secs.wav';
                              ' ...\toolbox\audio\samples\churchimpulseresponse-16-44p1-mono-5secs.wav'
                               ... and 26 more
                              }
    alternatefilesystemroots: {}
              outputdatatype: 'double'
                      labels: {}

numfilesindataset = numel(ads.files)

numfilesindataset = 29

create an audiodataaugmenter that applies random sequential augmentations. set numaugmentations to 2.

aug = audiodataaugmenter('numaugmentations',2)

aug = 
  audiodataaugmenter with properties:
               augmentationmode: 'sequential'
    augmentationparametersource: 'random'
               numaugmentations: 2
         timestretchprobability: 0.5000
             speedupfactorrange: [0.8000 1.2000]
          pitchshiftprobability: 0.5000
             semitoneshiftrange: [-2 2]
       volumecontrolprobability: 0.5000
                volumegainrange: [-3 3]
            addnoiseprobability: 0.5000
                       snrrange: [0 10]
           timeshiftprobability: 0.5000
                 timeshiftrange: [-0.0050 0.0050]

offline augmentation

to augment the audio dataset, create two augmentations of each file and then write the augmentations as wav files.

while hasdata(ads)
    [audioin,info] = read(ads);
    
    data = augment(aug,audioin,info.samplerate);
    
    [~,fn] = fileparts(info.filename);
    for i = 1:size(data,1)
        augmentedaudio = data.audio{i};
        
        % if augmentation caused an audio signal to have values outside of -1 and 1, 
        % normalize the audio signal to avoid clipping when writing.
        if max(abs(augmentedaudio),[],'all')>1
            augmentedaudio = augmentedaudio/max(abs(augmentedaudio),[],'all');
        end
        
        audiowrite(sprintf('%s_aug%d.wav',fn,i),augmentedaudio,info.samplerate)
    end
end

create an audiodatastore that points to the augmented dataset and confirm that the number of files in the dataset is double the original number of files.

augmentedads = audiodatastore(pwd)

augmentedads = 
  audiodatastore with properties:
                       files: {
                              ' ...\examples\audio-ex28074079\ambiance-16-44p1-mono-12secs_aug1.wav';
                              ' ...\examples\audio-ex28074079\ambiance-16-44p1-mono-12secs_aug2.wav';
                              ' ...\examples\audio-ex28074079\audioarray-16-16-4channels-20secs_aug1.wav'
                               ... and 55 more
                              }
    alternatefilesystemroots: {}
              outputdatatype: 'double'
                      labels: {}

numfilesinaugmenteddataset = numel(augmentedads.files)

numfilesinaugmenteddataset = 58

augment using tall arrays

when augmenting a dataset using tall arrays, the input data to the augmenter should be sampled at a consistent rate. subset the original audio dataset to only include files with a sample rate of 44.1 khz. most datasets are already cleaned to have a consistent sample rate.

keepfile = cellfun(@(x)contains(x,'44p1'),ads.files);
ads44p1 = subset(ads,keepfile);
fs = 44.1e3;

convert the audio datastore to a tall array. tall arrays are evaluated only when you request them explicitly using gather. matlab® automatically optimizes the queued calculations by minimizing the number of passes through the data. if you have the parallel computing toolbox™, you can spread the calculations across multiple machines. the audio data is represented as an m-by-1 tall cell array, where m is the number of files in the audio datastore.

adstall = tall(ads44p1)

starting parallel pool (parpool) using the 'local' profile ...
connected to the parallel pool (number of workers: 6).
adstall =
  m×1 tall cell array
    { 539648×1 double}
    { 227497×1 double}
    {   8000×1 double}
    { 685056×1 double}
    { 882688×2 double}
    {1115760×2 double}
    { 505200×2 double}
    {3195904×2 double}
        :         :
        :         :

define a cellfun function so that augmentation is applied to each cell of the tall array. call gather to evaluate the tall array.

augtall = cellfun(@(x)augment(aug,x,fs),adstall,"uniformoutput",false);
augmenteddataset = gather(augtall)

evaluating tall expression using the parallel pool 'local':
- pass 1 of 1: completed in 1 min 34 sec
evaluation completed in 1 min 34 sec

augmenteddataset=12×1 cell array
    {2×2 table}
    {2×2 table}
    {2×2 table}
    {2×2 table}
    {2×2 table}
    {2×2 table}
    {2×2 table}
    {2×2 table}
    {2×2 table}
    {2×2 table}
    {2×2 table}
    {2×2 table}

the augmented dataset is returned as a numfiles-by-1 cell array, where numfiles is the number of files in the datastore. each element of the cell array is a numaugmentationsperfile-by-2 table, where numaugmentationsperfile is the number of augmentations returned per file.

numfiles = numel(augmenteddataset)

numfiles = 12

numaugmentationsperfile = size(augmenteddataset{1},1)

numaugmentationsperfile = 2

augment using transform datastore

you can perform online data augmentation while you train your machine learning application using a transform datastore. call transform to create a new datastore that applies data augmentation while reading.

transformads = transform(ads,@(x,info)augment(aug,x,info),'includeinfo',true)

transformads = 
  transformeddatastore with properties:
    underlyingdatastore: [1×1 audiodatastore]
             transforms: {@(x,info)augment(aug,x,info)}
            includeinfo: 1

call read to return the augmented first file from the transform datastore.

augmentedread = read(transformads)

augmentedread=2×2 table
          audio          augmentationinfo
    _________________    ________________
    {539648×1 double}      [1×1 struct]  
    {586683×1 double}      [1×1 struct]

add custom augmentation method

you can expand the capabilities of audiodataaugmenter by adding custom augmentation methods.

read in an audio signal and listen to it.

[audioin,fs] = audioread('counting-16-44p1-mono-15secs.wav');
sound(audioin,fs)

create an audiodataaugmenter object. set the probability of applying white noise to 0.

augmenter = audiodataaugmenter('addnoiseprobability',0)

augmenter = 
  audiodataaugmenter with properties:
               augmentationmode: 'sequential'
    augmentationparametersource: 'random'
               numaugmentations: 1
         timestretchprobability: 0.5000
             speedupfactorrange: [0.8000 1.2000]
          pitchshiftprobability: 0.5000
             semitoneshiftrange: [-2 2]
       volumecontrolprobability: 0.5000
                volumegainrange: [-3 3]
            addnoiseprobability: 0
           timeshiftprobability: 0.5000
                 timeshiftrange: [-0.0050 0.0050]

specify a custom augmentation algorithm that applies pink noise. the addpinknoise algorithm is added to the augmenter properties.

algorithmname = 'addpinknoise';
algorithmhandle = @(x)x pinknoise(size(x),'like',x);
addaugmentationmethod(augmenter,algorithmname,algorithmhandle)
augmenter

augmenter = 
  audiodataaugmenter with properties:
               augmentationmode: 'sequential'
    augmentationparametersource: 'random'
               numaugmentations: 1
         timestretchprobability: 0.5000
             speedupfactorrange: [0.8000 1.2000]
          pitchshiftprobability: 0.5000
             semitoneshiftrange: [-2 2]
       volumecontrolprobability: 0.5000
                volumegainrange: [-3 3]
            addnoiseprobability: 0
           timeshiftprobability: 0.5000
                 timeshiftrange: [-0.0050 0.0050]
        addpinknoiseprobability: 0.5000

set the probability of adding pink noise to 1.

augmenter.addpinknoiseprobability = 1

augmenter = 
  audiodataaugmenter with properties:
               augmentationmode: 'sequential'
    augmentationparametersource: 'random'
               numaugmentations: 1
         timestretchprobability: 0.5000
             speedupfactorrange: [0.8000 1.2000]
          pitchshiftprobability: 0.5000
             semitoneshiftrange: [-2 2]
       volumecontrolprobability: 0.5000
                volumegainrange: [-3 3]
            addnoiseprobability: 0
           timeshiftprobability: 0.5000
                 timeshiftrange: [-0.0050 0.0050]
        addpinknoiseprobability: 1

augment the original signal and listen to the result. inspect parameters of the augmentation algorithms applied.

data = augment(augmenter,audioin,fs);
sound(data.audio{1},fs)
data.augmentationinfo(1)

ans = struct with fields:
    speedupfactor: 1
    semitoneshift: 0
       volumegain: 2.4803
        timeshift: -0.0022
     addpinknoise: 'applied'

plot the mel spectrograms of the original and augmented signals.

melspectrogram(audioin,fs)
title('original signal')

figure contains an axes object. the axes object with title original signal, xlabel time (s), ylabel frequency (khz) contains an object of type image.

melspectrogram(data.audio{1},fs)
title('augmented signal')

figure contains an axes object. the axes object with title augmented signal, xlabel time (s), ylabel frequency (khz) contains an object of type image.

algorithms

the audiodataaugmenter object enables you to configure your augmentation pipeline as deterministic or probabilistic using the augmentationparametersource property. you can also choose to apply the augmentations in series or in parallel using the augmentationmode property. the following sections describe the pipelines you can create and the applicable properties for each architecture.

random sequential augmentations

to define your augmentation as a sequence of probabilistically applied augmentations, set augmentationparametersource to 'random' and augmentationmode to 'sequential'.

the order that augmentations are applied is always the same. if you specify custom algorithms, they are applied at the end of the sequence, in the order you specified them.

in this pipeline configuration, these parameters apply:

augmentation method	parameters
stretch time	timestretchprobability speedupfactorrange
shift pitch	pitchshiftprobability semitoneshiftrange
control volume	volumecontrolprobability volumegainrange
add noise	addnoiseprobability snrrange
shift time	timeshiftprobability timeshiftrange

if you specify numaugmentations as greater than 1, then the object applies numaugmentations parallel random sequential augmentations. the probability of applying an augmentation, and the value of any parameters that are probabilistically determined, are independent.

specified sequential augmentations

to define your augmentation as a sequence of deterministically applied augmentations, set augmentationparametersource to 'specify' and augmentationmode to 'sequential'.

the order that augmentations are applied is always the same. if you specify custom algorithms, they are applied at the end of the sequence, in the order you specified them.

in this pipeline configuration, these parameters apply:

augmentation method	parameters
stretch time	applytimestretch speedupfactor
shift pitch	applypitchshift semitoneshift
control volume	applyvolumecontrol volumegain
add noise	applyaddnoise snr
shift time	applytimeshift timeshift

if you specify an augmentation method as a vector, then each element of the vector creates a separate branch in the augmentation pipeline. for example, the following object creates an augmentation pipeline that results in four separate augmentations:

aug = audiodataaugmenter("augmentationmode","sequential", ...
    "augmentationparametersource","specify", ...
    "speedupfactor",[0.8,1.2], ...
    "volumegain",[-3,-1])

aug = 
  audiodataaugmenter with properties:
               augmentationmode: "sequential"
    augmentationparametersource: "specify"
               applytimestretch: 1
                  speedupfactor: [0.8000 1.2000]
                applypitchshift: 1
                  semitoneshift: -3
             applyvolumecontrol: 1
                     volumegain: [-3 -1]
                  applyaddnoise: 1
                            snr: 5
                 applytimeshift: 1
                      timeshift: 0.0050

random independent augmentations

to define your augmentation as independently applied augmentations with randomly determined parameters, set augmentationparametersource to 'random' and augmentationmode to 'independent'.

in this pipeline configuration, these parameters apply:

augmentation method	parameters
stretch time	applytimestretch speedupfactorrange
shift pitch	applypitchshift semitoneshiftrange
control volume	applyvolumecontrol volumegainrange
add noise	applyaddnoise snrrange
shift time	applytimeshift timeshiftrange

if you specify numaugmentations as greater than 1, then the object applies numaugmentations parallel random independent augmentations. the value of any parameters that are probabilistically determined are independent.

specified independent augmentations

to define your augmentation as deterministically applied independent augmentations with deterministic parameters, set augmentationparametersource to 'specify' and augmentationmode to 'independent'.

in this pipeline configuration, these parameters apply:

augmentation method	parameters
stretch time	applytimestretch speedupfactor
shift pitch	applypitchshift semitoneshift
control volume	applyvolumecontrol volumegain
add noise	applyaddnoise snr
shift time	applytimeshift timeshift

aug = audiodataaugmenter("augmentationmode","independent", ...
    "augmentationparametersource","specify", ...
    "speedupfactor",[0.8,1.2], ...
    "volumegain",[-3,-1])

aug = 
  audiodataaugmenter with properties:
               augmentationmode: "independent"
    augmentationparametersource: "specify"
               applytimestretch: 1
                  speedupfactor: [0.8000 1.2000]
                applypitchshift: 1
                  semitoneshift: -3
             applyvolumecontrol: 1
                     volumegain: [-3 -1]
                  applyaddnoise: 1
                            snr: 5
                 applytimeshift: 1
                      timeshift: 0.0050

references

[1] salamon, justin, and juan pablo bello. "deep convolutional neural networks and data augmentation for environmental sound classification." ieee signal processing letters. vol. 24, issue 3, 2017.

extended capabilities

gpu arrays
accelerate code by running on a graphics processing unit (gpu) using parallel computing toolbox™.

usage notes and limitations:

lockphase must be set to false for the time stretching and pitch shifting augmentations. for more information, see .
using gpuarray (parallel computing toolbox) input with audiodataaugmenter is only recommended for a gpu with compute capability 7.0 ("volta") or above. other hardware might not offer any performance advantage. to check your gpu compute capability, see computecompability in the output from the gpudevice (parallel computing toolbox) function. for more information, see gpu computing requirements (parallel computing toolbox).

for an overview of gpu usage in matlab^®, see run matlab functions on a gpu (parallel computing toolbox).

version history

introduced in r2019b

augment audio data -凯发k8网页登录

description

creation

syntax

description

properties

augmentation pipeline

augmentationmode — augmentation mode 'sequential' (default) | 'independent'

augmentationparametersource — source of augmentation parameters 'random' (default) | 'specify'

numaugmentations — number of augmented signals to output 1 (default) | positive integer

dependencies

stretch time

timestretchprobability — probability of applying time stretch 0.5 (default) | scalar in the range [0, 1]

dependencies

speedupfactorrange — range of time stretch speedup factor [0.8 1.2] (default) | two-element row vector of positive nondecreasing values

dependencies

applytimestretch — apply time stretch true (default) | false

dependencies

speedupfactor — time stretch speedup factor 0.8 (default) | real positive scalar | real positive vector

dependencies

shift pitch

pitchshiftprobability — probability of applying pitch shift 0.5 (default) | scalar in the range [0, 1]

dependencies

semitoneshiftrange — range of pitch shift (semitones) [-2,2] (default) | two-element row vector of nondecreasing values

dependencies

applypitchshift — apply pitch shift true (default) | false

dependencies

semitoneshift — pitch shift (semitones) -3 (default) | real scalar | real vector

dependencies

control volume

volumecontrolprobability — probability of applying volume control 0.5 (default) | scalar in the range [0, 1]

dependencies

volumegainrange — range of volume gain (db) [-3,3] (default) | two-element row vector of nondecreasing values

dependencies

applyvolumecontrol — apply volume gain true (default) | false

dependencies

volumegain — volume gain (db) -3 (default) | scalar | vector

add noise

addnoiseprobability — probability of applying noise addition 0.5 (default) | scalar in the range [0, 1]

dependencies

snrrange — range of noise addition snr (db) [0,10] (default) | two-element row vector of nondecreasing values

dependencies

applyaddnoise — apply noise addition true (default) | false

dependencies

snr — noise addition snr (db) 5 (default) | scalar | vector

shift time

timeshiftprobability — probability of applying time shift 0.5 (default) | scalar in the range [0, 1]

dependencies

timeshiftrange — range of time shift (s) [-5e-3,5e3] (default) | two-element row vector of nondecreasing values.

dependencies

applytimeshift — apply time shift true (default) | false

dependencies

timeshift — time shift (s) 5e-3 (default) | scalar | vector

dependencies

object functions

examples

apply random sequential augmentations

apply specified sequential augmentations

apply random independent augmentations

apply specified independent augmentations

augment audio dataset

add custom augmentation method

algorithms

random sequential augmentations

specified sequential augmentations

random independent augmentations

specified independent augmentations

references

extended capabilities

gpu arrays accelerate code by running on a graphics processing unit (gpu) using parallel computing toolbox™.

version history

see also

wechat

`augmentationmode` — augmentation mode
`'sequential'` (default) | `'independent'`

`augmentationparametersource` — source of augmentation parameters
`'random'` (default) | `'specify'`

`numaugmentations` — number of augmented signals to output
`1` (default) | positive integer

`timestretchprobability` — probability of applying time stretch
`0.5` (default) | scalar in the range [0, 1]

`speedupfactorrange` — range of time stretch speedup factor
`[0.8 1.2]` (default) | two-element row vector of positive nondecreasing values

`applytimestretch` — apply time stretch
`true` (default) | `false`

`speedupfactor` — time stretch speedup factor
`0.8` (default) | real positive scalar | real positive vector

`pitchshiftprobability` — probability of applying pitch shift
`0.5` (default) | scalar in the range [0, 1]

`semitoneshiftrange` — range of pitch shift (semitones)
`[-2,2]` (default) | two-element row vector of nondecreasing values

`applypitchshift` — apply pitch shift
`true` (default) | `false`

`semitoneshift` — pitch shift (semitones)
`-3` (default) | real scalar | real vector

`volumecontrolprobability` — probability of applying volume control
`0.5` (default) | scalar in the range [0, 1]

`volumegainrange` — range of volume gain (db)
`[-3,3]` (default) | two-element row vector of nondecreasing values

`applyvolumecontrol` — apply volume gain
`true` (default) | `false`

`volumegain` — volume gain (db)
`-3` (default) | scalar | vector

`addnoiseprobability` — probability of applying noise addition
`0.5` (default) | scalar in the range [0, 1]

`snrrange` — range of noise addition snr (db)
`[0,10]` (default) | two-element row vector of nondecreasing values

`applyaddnoise` — apply noise addition
`true` (default) | `false`

`snr` — noise addition snr (db)
`5` (default) | scalar | vector

`timeshiftprobability` — probability of applying time shift
`0.5` (default) | scalar in the range [0, 1]

`timeshiftrange` — range of time shift (s)
`[-5e-3,5e3]` (default) | two-element row vector of nondecreasing values.

`applytimeshift` — apply time shift
`true` (default) | `false`

`timeshift` — time shift (s)
`5e-3` (default) | scalar | vector

gpu arrays
accelerate code by running on a graphics processing unit (gpu) using parallel computing toolbox™.