label audio using audio labeler
note
audio labeler will be removed in a future release. use instead.
the audio labeler app enables you to interactively define and visualize
ground-truth labels for audio data sets. this example shows how you can create label
definitions and then interactively label a set of audio files. the example also shows
how to export the labeled ground-truth data, which you can then use with audiodatastore
to train a machine learning system.
load unlabeled data
to open the audio labeler, at the matlab® command prompt, enter:
audiolabeler
this example uses the audio files included with audio toolbox™. to locate the file path on your system, at the matlab command prompt, enter:
fullfile(matlabroot,'toolbox','audio','samples')
to load audio from a file, click load > audio folders and select the folder containing audio files you want to label.
define and assign labels
file-level labels
the audio samples include music, speech, and ambience. to create a file-level
label that defines the contents of the audio file as music
,
speech
, ambience
, or
unknown
, click . specify the label name
as content
, the data type as
categorical
, and the
categories as music
,
speech
, ambience
, or
unknown
. set the default value of
the label definition to unknown
.
all audio files in the data browser are now associated
with the content
label name. to listen to the audio file
selected in the data browser and confirm that it is a music
file, click . to set the value of the
contents
label, click unknown
in the
file labels panel and select
music
from the drop-down menu.
the selected audio file now has the label name content
with
value music
assigned to it. you can continue setting the
content
value for each file by selecting a file in the
data browser and then selecting a value from the
file labels panel.
region-level labels
you can define region-level labels manually or by using the provided automated algorithms. audio toolbox includes automatic labeling algorithms for speech detection and speech-to-text transcription.
note
to enable automatic speech-to-text transcription, you must download and set up the speech-to-text transcription functionality. once you download and set up the speech-to-text transcription functionality, the speech to text automation algorithm appears as an option on the toolstrip.
select counting-16-44p1-mono-15secs.wav
from the
data browser.
to create a region-level label that indicates if speech is detected, first select speech detector from the automation section. you can control the speech detection algorithm using the window length (s) and merge regions within (s) parameters. use the default parameters for the speech detection algorithm. to create an roi label and to label regions of the selected audio file, select run.
close the speech detector tab. you can correct or fine-tune the automatically generated speechdetected regions by selecting the roi from the roi bar, and then dragging the edges of the region. the roi bar is directly to the right of the roi label. when a region is selected, clicking plays only the selected region, enabling you to verify whether the selected region captures all relevant auditory information.
if you have set up a speech-to-text transcription service, select speech to text from the automation section. you can control the speech-to-text transcription using name-value pair options specific to your selected service. this example uses the ibm® service and specifies no additional options.
the roi labels returned from the transcription service are strings with beginning and end points. the beginning and end points do not exactly correspond to the beginning and end points of the manually corrected speech detection regions. you can correct the endpoints of the speechcontent roi label by selecting the region and then dragging the edges of the region. the transcription service misclassified the words "two" as "to," "four" as "for," and "ten" as "then." you can correct the string by selecting the region and then entering a new string.
create another region-level label by clicking in the roi labels panel.
set label name to vuv
, set
data type to categorical
,
and categories to voiced
and
unvoiced
.
by default, the waveform viewer shows the entire file. to display tools for zooming and panning, hover over the top right corner of the plot. zoom in on the first five seconds of the audio file.
when you select a region in the plot and then hover over any of the two roi
bars, the shadow of the region appears. to assign the selected region to the
category voiced, click one on the
speechcontent label bar. hover over the
vuv label bar and then click the shadow and choose
voiced
.
the next two words, "two" and "three," contain both voiced and unvoiced speech. select each region of speech on the plot, hover over the vuv label bar, and select the correct category for that region.
export label definitions
you can export label definitions as a mat file or as a matlab script. maintaining label definitions enables consistent labeling between users and sessions. select export > label definitions > to file.
the labels are saved as an array of objects. in your next session, you can import the label definitions by selecting import > label definitions > from file.
export labeled audio data
you can export the labeled signal set to a file or to your workspace. select export > labels > to workspace.
the audio labeler creates a object named
labeledset_
hhmmss, where
hhmmss is the time the object is created in hours, minutes,
and seconds.
labeledset_104620
labeledset_104620 = labeledsignalset with properties: source: {29×1 cell} nummembers: 29 timeinformation: "inherent" labels: [29×4 table] description: "" use labeldefinitionshierarchy to see a list of labels and sublabels. use setlabelvalue to add data to the set.
the labels you created are saved as a table to the labels
property.
labeledset_142356.labels
ans = 29×4 table content speechdetected speechcontent vuv ________ ______________ _____________ ___________ c:\program files\matlab\r2019b\toolbox\audio\samples\ambiance-16-44p1-mono-12secs.wav ambience { 0×2 table} { 0×2 table} {0×2 table} c:\program files\matlab\r2019b\toolbox\audio\samples\audioarray-16-16-4channels-20secs.wav ambience { 0×2 table} { 0×2 table} {0×2 table} c:\program files\matlab\r2019b\toolbox\audio\samples\churchimpulseresponse-16-44p1-mono-5secs.wav unknown { 0×2 table} { 0×2 table} {0×2 table} c:\program files\matlab\r2019b\toolbox\audio\samples\click-16-44p1-mono-0.2secs.wav ambience { 0×2 table} { 0×2 table} {0×2 table} c:\program files\matlab\r2019b\toolbox\audio\samples\counting-16-44p1-mono-15secs.wav speech {10×2 table} {10×2 table} {5×2 table} c:\program files\matlab\r2019b\toolbox\audio\samples\engine-16-44p1-stereo-20sec.wav ambience { 0×2 table} { 0×2 table} {0×2 table} c:\program files\matlab\r2019b\toolbox\audio\samples\femalespeech-16-8-mono-3secs.wav speech { 0×2 table} { 0×2 table} {0×2 table} c:\program files\matlab\r2019b\toolbox\audio\samples\funkydrums-44p1-stereo-25secs.mp3 music { 0×2 table} { 0×2 table} {0×2 table} c:\program files\matlab\r2019b\toolbox\audio\samples\funkydrums-48-stereo-25secs.mp3 music { 0×2 table} { 0×2 table} {0×2 table} c:\program files\matlab\r2019b\toolbox\audio\samples\heli_16ch_acn_sn3d.wav ambience { 0×2 table} { 0×2 table} {0×2 table} c:\program files\matlab\r2019b\toolbox\audio\samples\jetairplane-16-11p025-mono-16secs.wav ambience { 0×2 table} { 0×2 table} {0×2 table} c:\program files\matlab\r2019b\toolbox\audio\samples\laughter-16-8-mono-4secs.wav ambience { 0×2 table} { 0×2 table} {0×2 table} c:\program files\matlab\r2019b\toolbox\audio\samples\mainstreetone-24-96-stereo-63secs.wav ambience { 0×2 table} { 0×2 table} {0×2 table} c:\program files\matlab\r2019b\toolbox\audio\samples\noisyspeech-16-22p5-mono-5secs.wav speech { 0×2 table} { 0×2 table} {0×2 table} c:\program files\matlab\r2019b\toolbox\audio\samples\rainbow-16-8-mono-114secs.wav speech { 0×2 table} { 0×2 table} {0×2 table} c:\program files\matlab\r2019b\toolbox\audio\samples\rainbownoisy-16-8-mono-114secs.wav speech { 0×2 table} { 0×2 table} {0×2 table} c:\program files\matlab\r2019b\toolbox\audio\samples\randomoscthree-24-96-stereo-13secs.aif music { 0×2 table} { 0×2 table} {0×2 table} c:\program files\matlab\r2019b\toolbox\audio\samples\rockdrums-44p1-stereo-11secs.mp3 music { 0×2 table} { 0×2 table} {0×2 table} c:\program files\matlab\r2019b\toolbox\audio\samples\rockdrums-48-stereo-11secs.mp3 music { 0×2 table} { 0×2 table} {0×2 table} c:\program files\matlab\r2019b\toolbox\audio\samples\rockguitar-16-44p1-stereo-72secs.wav music { 0×2 table} { 0×2 table} {0×2 table} c:\program files\matlab\r2019b\toolbox\audio\samples\rockguitar-16-96-stereo-72secs.flac music { 0×2 table} { 0×2 table} {0×2 table} c:\program files\matlab\r2019b\toolbox\audio\samples\softguitar-44p1_mono-10mins.ogg music { 0×2 table} { 0×2 table} {0×2 table} c:\program files\matlab\r2019b\toolbox\audio\samples\speechdft-16-8-mono-5secs.wav speech { 0×2 table} { 0×2 table} {0×2 table} c:\program files\matlab\r2019b\toolbox\audio\samples\trainwhistle-16-44p1-mono-9secs.wav ambience { 0×2 table} { 0×2 table} {0×2 table} c:\program files\matlab\r2019b\toolbox\audio\samples\turbine-16-44p1-mono-22secs.wav ambience { 0×2 table} { 0×2 table} {0×2 table} c:\program files\matlab\r2019b\toolbox\audio\samples\washingmachine-16-44p1-stereo-10secs.wav ambience { 0×2 table} { 0×2 table} {0×2 table} c:\program files\matlab\r2019b\toolbox\audio\samples\washingmachine-16-8-mono-1000secs.wav ambience { 0×2 table} { 0×2 table} {0×2 table} c:\program files\matlab\r2019b\toolbox\audio\samples\washingmachine-16-8-mono-200secs.wav ambience { 0×2 table} { 0×2 table} {0×2 table} c:\program files\matlab\r2019b\toolbox\audio\samples\waveguideloopone-24-96-stereo-10secs.aif music { 0×2 table} { 0×2 table} {0×2 table}
the file names associated with the labels are saved as a cell array to the
source
property.
labeledset_104620.source
ans = 29×1 cell array {'c:\program files\matlab\r2019b\toolbox\audio\samples\ambiance-16-44p1-mono-12secs.wav' } {'c:\program files\matlab\r2019b\toolbox\audio\samples\audioarray-16-16-4channels-20secs.wav' } {'c:\program files\matlab\r2019b\toolbox\audio\samples\churchimpulseresponse-16-44p1-mono-5secs.wav'} {'c:\program files\matlab\r2019b\toolbox\audio\samples\click-16-44p1-mono-0.2secs.wav' } {'c:\program files\matlab\r2019b\toolbox\audio\samples\counting-16-44p1-mono-15secs.wav' } {'c:\program files\matlab\r2019b\toolbox\audio\samples\engine-16-44p1-stereo-20sec.wav' } {'c:\program files\matlab\r2019b\toolbox\audio\samples\femalespeech-16-8-mono-3secs.wav' } {'c:\program files\matlab\r2019b\toolbox\audio\samples\funkydrums-44p1-stereo-25secs.mp3' } {'c:\program files\matlab\r2019b\toolbox\audio\samples\funkydrums-48-stereo-25secs.mp3' } {'c:\program files\matlab\r2019b\toolbox\audio\samples\heli_16ch_acn_sn3d.wav' } {'c:\program files\matlab\r2019b\toolbox\audio\samples\jetairplane-16-11p025-mono-16secs.wav' } {'c:\program files\matlab\r2019b\toolbox\audio\samples\laughter-16-8-mono-4secs.wav' } {'c:\program files\matlab\r2019b\toolbox\audio\samples\mainstreetone-24-96-stereo-63secs.wav' } {'c:\program files\matlab\r2019b\toolbox\audio\samples\noisyspeech-16-22p5-mono-5secs.wav' } {'c:\program files\matlab\r2019b\toolbox\audio\samples\rainbow-16-8-mono-114secs.wav' } {'c:\program files\matlab\r2019b\toolbox\audio\samples\rainbownoisy-16-8-mono-114secs.wav' } {'c:\program files\matlab\r2019b\toolbox\audio\samples\randomoscthree-24-96-stereo-13secs.aif' } {'c:\program files\matlab\r2019b\toolbox\audio\samples\rockdrums-44p1-stereo-11secs.mp3' } {'c:\program files\matlab\r2019b\toolbox\audio\samples\rockdrums-48-stereo-11secs.mp3' } {'c:\program files\matlab\r2019b\toolbox\audio\samples\rockguitar-16-44p1-stereo-72secs.wav' } {'c:\program files\matlab\r2019b\toolbox\audio\samples\rockguitar-16-96-stereo-72secs.flac' } {'c:\program files\matlab\r2019b\toolbox\audio\samples\softguitar-44p1_mono-10mins.ogg' } {'c:\program files\matlab\r2019b\toolbox\audio\samples\speechdft-16-8-mono-5secs.wav' } {'c:\program files\matlab\r2019b\toolbox\audio\samples\trainwhistle-16-44p1-mono-9secs.wav' } {'c:\program files\matlab\r2019b\toolbox\audio\samples\turbine-16-44p1-mono-22secs.wav' } {'c:\program files\matlab\r2019b\toolbox\audio\samples\washingmachine-16-44p1-stereo-10secs.wav' } {'c:\program files\matlab\r2019b\toolbox\audio\samples\washingmachine-16-8-mono-1000secs.wav' } {'c:\program files\matlab\r2019b\toolbox\audio\samples\washingmachine-16-8-mono-200secs.wav' } {'c:\program files\matlab\r2019b\toolbox\audio\samples\waveguideloopone-24-96-stereo-10secs.aif' }
prepare audio datastore for deep learning workflow
to continue on to a deep learning or machine learning workflow, use audiodatastore
. using an audio datastore enables you to apply
capabilities that are common to machine learning applications, such as . spliteachlabel
enables you
split your data into train and test sets.
create an audio datastore for your labeled signal set. specify the location of the
audio files as the first argument of audiodatastore
and set the
labels
property of audiodatastore
to the
labels
property of the labeled signal set.
ads = audiodatastore(labeledset_104620.source,'labels',labeledset_104620.labels)
ads = audiodatastore with properties: files: { ' ...\toolbox\audio\samples\ambiance-16-44p1-mono-12secs.wav'; ' ...\toolbox\audio\samples\audioarray-16-16-4channels-20secs.wav'; ' ...\toolbox\audio\samples\churchimpulseresponse-16-44p1-mono-5secs.wav' ... and 26 more } labels: 29-by-4 table alternatefilesystemroots: {} outputdatatype: 'double'
call and specify the content
table
variable to count the number of files that are labeled as
ambience
, music
,
speech
, or unknown
.
counteachlabel(ads,'tablevariable','content')
ans = 4×2 table content count ________ _____ ambience 13 music 9 speech 6 unknown 1
for examples of using labeled audio data in a machine learning or deep learning workflow, see:
see also
apps
objects
- | |
audiodatastore
| |