main content

ai for audio -凯发k8网页登录

dataset management, labeling, and augmentation; segmentation and feature extraction for audio, speech, and acoustic applications

audio toolbox™ provides functionality to develop machine and deep learning solutions for audio, speech, and acoustic applications including speaker identification, speech command recognition, speech separation, acoustic scene recognition, denoising, and many more.

  • use audiodatastore to ingest large audio data sets and process files in parallel.

  • use to build audio data sets by annotating audio recordings manually and automatically.

  • use audiodataaugmenter to create randomized pipelines of built-in or custom signal processing methods for augmenting and synthesizing audio data sets.

  • use audiofeatureextractor to extract combinations of different features while sharing intermediate computations.

audio toolbox also provides access to third-party apis for text-to-speech and speech-to-text, and it includes pretrained models so that you can perform transfer learning, classify sounds, and extract feature embeddings. using pretrained networks requires deep learning toolbox™.

categories


  • apply ai workflows to audio applications

  • ingest, create, and label large data sets

  • mel spectrogram, mfcc, pitch, spectral descriptors

  • augmentation pipelines, shift pitch and time, stretch time, control volume and noise

  • detect and isolate speech and other sounds
  • pretrained models
    transfer learning, sound classification, feature embeddings, pretrained audio deep learning networks

  • use a pretrained model or third-party apis for text-to-speech and speech-to-text
  • code generation and gpu support
    generate portable c/c /mex functions and use gpus to deploy or accelerate processing
网站地图