main content

modeling and prediction -凯发k8网页登录

develop predictive models using topic models and word embeddings

to find clusters and extract features from high-dimensional text datasets, you can use machine learning techniques and models such as lsa, lda, and word embeddings. you can combine features created with text analytics toolbox™ with features from other data sources. with these features, you can build machine learning models that take advantage of textual, numeric, and other types of data.

functions

bag-of-words model
bag-of-n-grams model
add documents to bag-of-words or bag-of-n-grams model
remove documents from bag-of-words or bag-of-n-grams model
remove words with low counts from bag-of-words model
remove infrequently seen n-grams from bag-of-n-grams model
remove selected words from documents or bag-of-words model
remove n-grams from bag-of-n-grams model
remove empty documents from tokenized document array, bag-of-words model, or bag-of-n-grams model
most important words in bag-of-words model or lda topic
most frequent n-grams
encode documents as matrix of word or n-gram counts
tfidfterm frequency–inverse document frequency (tf-idf) matrix
combine multiple bag-of-words or bag-of-n-grams models
sentiment scores with vader algorithm
sentiment scores with ratio rule
fasttextwordembeddingpretrained fasttext word embedding
word encoding model to map words to indices and back
convert documents to sequences for deep learning
word embedding layer for deep learning networks
word2vecmap word to embedding vector
map word to encoding index
map embedding vector to word
map encoding index to word
test if word is member of word embedding or encoding
read word embedding from file
train word embedding
write word embedding file
word embedding model to map words to vectors and back
extractsummaryextract summary from documents
extract keywords using rake
extract keywords using textrank
evaluate translation or summarization with bleu similarity score
evaluate translation or summarization with rouge similarity score
bm25similaritydocument similarities with bm25 algorithm
document similarities with cosine similarity
textrankscoresdocument scoring with textrank algorithm
document scoring with lexrank algorithm
document scoring with maximal marginal relevance (mmr) algorithm
fit latent dirichlet allocation (lda) model
fit lsa model
resume fitting lda model
document log-probabilities and goodness of fit of lda model
predict top lda topics of documents
transform documents into lower-dimensional space
latent dirichlet allocation (lda) model
latent semantic analysis (lsa) model
addentitydetailsadd entity tags to documents
train hmm-based model for named entity recognition (ner)
predict entities using named entity recognition (ner) model
hmm-based model for named entity recognition (ner)
create word cloud chart from text, bag-of-words model, bag-of-n-grams model, or lda model
2-d scatter plot of text
3-d scatter plot of text

topics

classification and modeling


  • this example shows how to create a function which cleans and preprocesses text data for analysis using the preprocess text data live editor task.
  • create simple text model for classification
    this example shows how to train a simple text classifier on word frequency counts using a bag-of-words model.

  • this example shows how to train a document classifier by converting documents to feature vectors using word embeddings.
  • analyze text data using multiword phrases
    this example shows how to analyze text using n-gram frequency counts.
  • analyze text data using topic models
    this example shows how to use the latent dirichlet allocation (lda) topic model to analyze text data.
  • choose number of topics for lda model
    this example shows how to decide on a suitable number of topics for a latent dirichlet allocation (lda) model.
  • compare lda solvers
    this example shows how to compare latent dirichlet allocation (lda) solvers by comparing the goodness of fit and the time taken to fit the model.

  • this example shows how to visualize the clustering of documents using a latent dirichlet allocation (lda) topic model and a t-sne plot.

  • this example shows how to analyze correlations between topics in a latent dirichlet allocation (lda) topic model.

  • this example shows how to fit a latent dirichlet allocation (lda) topic model and visualize correlations between the lda topics and document labels.

  • this example shows how to train a custom named entity recognition (ner) model.

  • this example shows how to create a co-occurrence network using a bag-of-words model.

sentiment analysis and keyword extraction

  • analyze sentiment in text
    this example shows how to use the valence aware dictionary and sentiment reasoner (vader) algorithm for sentiment analysis.
  • generate domain specific sentiment lexicon
    this example shows how to generate a lexicon for sentiment analysis using 10-k and 10-q financial reports.
  • train a sentiment classifier
    this example shows how to train a classifier for sentiment analysis using an annotated list of positive and negative sentiment words and a pretrained word embedding.

  • this example shows how to extract keywords from text data using rapid automatic keyword extraction (rake).
  • extract keywords from text data using textrank
    this example shows to extract keywords from text data using textrank.

deep learning

  • classify text data using deep learning
    this example shows how to classify text data using a deep learning long short-term memory (lstm) network.

  • this example shows how to classify text data using a convolutional neural network.

  • this example shows how to classify out-of-memory text data with a deep learning network using a transformed datastore.

  • this example shows how to convert decimal strings to roman numerals using a recurrent sequence-to-sequence encoder-decoder model with attention.

  • this example shows how to classify text data that has multiple independent labels.
  • generate text using deep learning (deep learning toolbox)
    this example shows how to train a deep learning long short-term memory (lstm) network to generate text.
  • pride and prejudice and matlab
    this example shows how to train a deep learning lstm network to generate text using character embeddings.
  • word-by-word text generation using deep learning
    this example shows how to train a deep learning lstm network to generate text word-by-word.

  • this example shows how to classify text data using a deep learning bidirectional long short-term memory (bilstm) network with a custom training loop.

  • this example shows how to generate text data using autoencoders.

  • this example shows how to define a text encoder model function.

  • this example shows how to define a text decoder model function.

  • this example shows how to train a german to english language translator using a recurrent sequence-to-sequence encoder-decoder model with attention.

language support


  • information on using text analytics toolbox features for other languages.

  • information on japanese support in text analytics toolbox.
  • analyze japanese text data
    this example shows how to import, prepare, and analyze japanese text data using a topic model.

  • information on german support in text analytics toolbox.
  • analyze german text data
    this example shows how to import, prepare, and analyze german text data using a topic model.
网站地图