ai for audio -凯发k8网页登录
audio toolbox™ provides functionality to develop machine and deep learning solutions for audio, speech, and acoustic applications including speaker identification, speech command recognition, speech separation, acoustic scene recognition, denoising, and many more.
use
audiodatastore
to ingest large audio data sets and process files in parallel.use to build audio data sets by annotating audio recordings manually and automatically.
use
audiodataaugmenter
to create randomized pipelines of built-in or custom signal processing methods for augmenting and synthesizing audio data sets.use
audiofeatureextractor
to extract combinations of different features while sharing intermediate computations.
audio toolbox also provides access to third-party apis for text-to-speech and speech-to-text, and it includes pretrained models so that you can perform transfer learning, classify sounds, and extract feature embeddings. using pretrained networks requires deep learning toolbox™.
categories
apply ai workflows to audio applications
ingest, create, and label large data sets
mel spectrogram, mfcc, pitch, spectral descriptors
augmentation pipelines, shift pitch and time, stretch time, control volume and noise
detect and isolate speech and other sounds
- pretrained models
transfer learning, sound classification, feature embeddings, pretrained audio deep learning networks
use a pretrained model or third-party apis for text-to-speech and speech-to-text
- code generation and gpu support
generate portable c/c /mex functions and use gpus to deploy or accelerate processing