document scoring with textrank algorithm

since r2020a

syntax

scores = textrankscores(documents)

scores = textrankscores(bag)

description

scores = textrankscores(documents) scores documents for importance according to pairwise similarity values using the textrank algorithm. to compute similarities and importance scores, the function uses the bm25 and pagerank algorithms, respectively.

example

scores = textrankscores(bag) scores documents encoded by a bag-of-words or bag-of-n-grams model bag.

examples

importance of documents

create an array of tokenized documents.

str = [
    "the quick brown fox jumped over the lazy dog"
    "the fast brown fox jumped over the lazy dog"
    "the lazy dog sat there and did nothing"
    "the other animals sat there watching"];
documents = tokenizeddocument(str)

documents = 
  4x1 tokenizeddocument:
    9 tokens: the quick brown fox jumped over the lazy dog
    9 tokens: the fast brown fox jumped over the lazy dog
    8 tokens: the lazy dog sat there and did nothing
    6 tokens: the other animals sat there watching

calculate the textrank scores.

scores = textrankscores(documents);

visualize the scores in a bar chart.

figure
bar(scores)
xlabel("document")
ylabel("score")
title("textrank scores")

figure contains an axes object. the axes object with title textrank scores, xlabel document, ylabel score contains an object of type bar.

scores using bag-of-words model

create a bag-of-words model from the text data in sonnets.csv.

filename = "sonnets.csv";
tbl = readtable(filename,'texttype','string');
textdata = tbl.sonnet;
documents = tokenizeddocument(textdata);
bag = bagofwords(documents)

bag = 
  bagofwords with properties:
          counts: [154x3527 double]
      vocabulary: ["from"    "fairest"    "creatures"    "we"    "desire"    "increase"    ","    "that"    "thereby"    "beauty's"    "rose"    "might"    "never"    "die"    "but"    "as"    "the"    "riper"    "should"    "by"    "time"    ...    ]
        numwords: 3527
    numdocuments: 154

calculate the textrank scores.

scores = textrankscores(bag);

visualize the scores in a bar chart.

figure
bar(scores)
xlabel("document")
ylabel("score")
title("textrank scores")

figure contains an axes object. the axes object with title textrank scores, xlabel document, ylabel score contains an object of type bar.

input arguments

`documents` — input documents
`tokenizeddocument` array | string array of words | cell array of character vectors

input documents, specified as a tokenizeddocument array, a string array of words, or a cell array of character vectors. if documents is not a tokenizeddocument array, then it must be a row vector representing a single document, where each element is a word. to specify multiple documents, use a tokenizeddocument array.

`bag` — input model
`bagofwords` object | `bagofngrams` object

input bag-of-words or bag-of-n-grams model, specified as a object or a object. if bag is a bagofngrams object, then the function treats each n-gram as a single word.

output arguments

`scores` — textrank scores
vector

textrank scores, returned as a n-by-1 vector, where scores(i) corresponds to the score for the ith input document and n is the number of input documents.

references

[1] mihalcea, rada, and paul tarau. "textrank: bringing order into text." in proceedings of the 2004 conference on empirical methods in natural language processing, pp. 404-411. 2004.

version history

introduced in r2020a

document scoring with textrank algorithm -凯发k8网页登录

syntax

description

examples

importance of documents

scores using bag-of-words model

input arguments

`documents` — input documents
`tokenizeddocument` array | string array of words | cell array of character vectors

`bag` — input model
`bagofwords` object | `bagofngrams` object

output arguments

`scores` — textrank scores
vector

references

version history

see also

topics

document scoring with textrank algorithm -凯发k8网页登录

syntax

description

examples

importance of documents

scores using bag-of-words model

input arguments

documents — input documents tokenizeddocument array | string array of words | cell array of character vectors

bag — input model bagofwords object | bagofngrams object

output arguments

scores — textrank scores vector

references

version history

see also

topics

wechat

`documents` — input documents
`tokenizeddocument` array | string array of words | cell array of character vectors

`bag` — input model
`bagofwords` object | `bagofngrams` object

`scores` — textrank scores
vector