main content

visualize text data using word clouds -凯发k8网页登录

this example shows how to visualize text data using word clouds.

text analytics toolbox extends the functionality of the wordcloud (matlab) function. it adds support for creating word clouds directly from string arrays and creating word clouds from bag-of-words models and lda topics.

load the example data. the file factoryreports.csv contains factory reports, including a text description and categorical labels for each event.

filename = "factoryreports.csv";
tbl = readtable(filename,'texttype','string');

extract the text data from the description column.

textdata = tbl.description;
textdata(1:10)
ans = 10x1 string
    "items are occasionally getting stuck in the scanner spools."
    "loud rattling and banging sounds are coming from assembler pistons."
    "there are cuts to the power when starting the plant."
    "fried capacitors in the assembler."
    "mixer tripped the fuses."
    "burst pipe in the constructing agent is spraying coolant."
    "a fuse is blown in the mixer."
    "things continue to tumble off of the belt."
    "falling items from the conveyor belt."
    "the scanner reel is split, it will soon begin to curve."

create a word cloud from the reports.

figure
wordcloud(textdata);
title("factory reports")

figure contains an object of type wordcloud. the chart of type wordcloud has title factory reports.

compare the words in the reports with labels "leak" and "mechanical failure". create word clouds of the reports for each of these labels. specify the word colors to be blue and magenta for each word cloud respectively.

figure
labels = tbl.category;
subplot(1,2,1)
idx = labels == "leak";
wordcloud(textdata(idx),'color','blue');
title("leak")
subplot(1,2,2)
idx = labels == "mechanical failure";
wordcloud(textdata(idx),'color','magenta');
title("mechanical failure")

figure contains objects of type wordcloud. the chart of type wordcloud has title leak. the chart of type wordcloud has title mechanical failure.

compare the words in the reports with urgency "low", "medium", and "high".

figure
urgency = tbl.urgency;
subplot(1,3,1)
idx = urgency == "low";
wordcloud(textdata(idx));
title("urgency: low")
subplot(1,3,2)
idx = urgency == "medium";
wordcloud(textdata(idx));
title("urgency: medium")
subplot(1,3,3)
idx = urgency == "high";
wordcloud(textdata(idx));
title("urgency: high")

figure contains objects of type wordcloud. the chart of type wordcloud has title urgency: low. the chart of type wordcloud has title urgency: medium. the chart of type wordcloud has title urgency: high.

compare the words in the reports with cost reported in hundreds of dollars to the reports with costs reported in thousands of dollars. create word clouds of the reports for each of these amounts with highlight color blue and red respectively.

cost = tbl.cost;
idx = cost > 100;
figure
wordcloud(textdata(idx),'highlightcolor','blue');
title("cost > $100")

figure contains an object of type wordcloud. the chart of type wordcloud has title cost > $100.

idx = cost > 1000;
figure
wordcloud(textdata(idx),'highlightcolor','red');
title("cost > $1,000")

figure contains an object of type wordcloud. the chart of type wordcloud has title cost > $1,000.

see also

| |

related topics

网站地图