conduct sentiment analysis using historical tweets
this example shows how to search and retrieve all available tweets in the last 7 days and import them into matlab®. after importing the data, you can conduct sentiment analysis. this analysis enables you to determine subjective information, such as moods, opinions, or emotional reactions, from text data. this example searches for positive and negative moods regarding the financial services industry.
to run this example, you need twitter® credentials. to obtain these credentials, you must first log in to your twitter account. then, fill out the form in .
to access the example code, enter edit twitterexample.m
at the
command line.
connect to twitter
create a twitter connection using your credentials. (the values in this example do not represent real twitter credentials.)
consumerkey = 'abcdefghijklmnop123456789'; consumersecret = 'qrstuvwxyz123456789'; accesstoken = '123456789abcdefghijklmnop'; accesstokensecret = '123456789qrstuvwxyz'; c = twitter(consumerkey,consumersecret,accesstoken,accesstokensecret);
check the twitter connection. if the statuscode
property has the value
ok
, the connection is successful.
c.statuscode
ans = ok
retrieve latest tweets
search for the latest 100 tweets about the financial services industry using
the twitter connection object. use the search term financial
services
. import tweet® data into the matlab workspace.
tweetquery = 'financial services'; s = search(c,tweetquery,'count',100); statuses = s.body.data.statuses; pause(2)
statuses
contains the tweet data as a cell array of 100 structures. each structure contains a
field for the tweet text, and the remaining fields contain other information about the
tweet.
search and retrieve the next 100 tweets that have occurred since the previous request.
srefresh = search(c,tweetquery,'count',100, ... 'since_id',s.body.data.search_metadata.max_id_str); statuses = [statuses;srefresh.body.data.statuses];
statuses
contains the latest 100 tweets in addition to the
previous 100 tweets.
retrieve all available tweets
retrieve all available tweets about the financial services industry using a
while
loop. check for available data using the
isfield
function and the structure field
next_results
.
while isfield(s.body.data.search_metadata,'next_results') % convert results to string nextresults = string(s.body.data.search_metadata.next_results); % extract maximum tweet identifier max_id = extractbetween(nextresults,"max_id=","&"); % convert maximum tweet identifier to a character vector cmax_id = char(max_id); % search for tweets s = search(c,tweetquery,'count',100,'max_id',cmax_id); % retrieve tweet text for each tweet statuses = [statuses;s.body.data.statuses]; end
retrieve the creation time and text of each tweet. retrieve the creation time for unstructured data by accessing it in a cell array of structures. for structured data, access the creation time by transposing the field in the structure array.
if iscell(statuses) % unstructured data numtweets = length(statuses); % determine total number of tweets tweettimes = cell(numtweets,1); % allocate space for tweet times and tweet text tweettexts = tweettimes; for i = 1:numtweets tweettimes{i} = statuses{i}.created_at; % retrieve the time each tweet was created tweettexts{i} = statuses{i}.text; % retrieve the text of each tweet end else % structured data tweettimes = {statuses.created_at}'; tweettexts = {statuses.text}'; end
tweettimes
contains the creation time for each tweet. tweettexts
contains the text for each
tweet.
create the timetable tweets
for all tweets by using the
text and creation time of each tweet.
tweets = timetable(tweettexts,'rowtimes', ... datetime(tweettimes,'format','eee mmm dd hh:mm:ss ssss yyyy'));
conduct sentiment analysis on tweets
create a glossary of words that are associated with positive sentiment.
poskeywords = {'happy','great','good', ... 'fast','optimized','nice','interesting','amazing','top','award', ... 'winner','wins','cool','thanks','useful'};
poskeywords
is a cell array of character vectors. each
character vector is a word that represents an instance of positive
sentiment.
search each tweet for words in the positive sentiment glossary. determine the total number of tweets that contain a positive sentiment. out of the total number of positive tweets, determine the total number of retweets.
% determine the total number of tweets numtweets = height(tweets); % determine the positive tweets numpostweets = 0; numposrts = 0; for i = 1:numtweets % compare tweet to positive sentiment glossary djobs = contains(tweets.tweettexts{i},poskeywords,'ignorecase',true); if djobs % increase total count of tweets with positive sentiment by one numpostweets = numpostweets 1; % determine if positive tweet is a retweet rts = strncmp('rt @',tweets.tweettexts{i},4); if rts % increase total count of positive retweets by one numposrts = numposrts 1; end end end
numpostweets
contains the total number of tweets with
positive sentiment.
numposrts
contains the total number of retweets with
positive sentiment.
create a glossary of words that are associated with negative sentiment.
negkeywords = {'sad','poor','bad','slow','weaken','mean','boring', ... 'ordinary','bottom','loss','loser','loses','uncool', ... 'criticism','useless'};
negkeywords
is a cell array of character vectors. each
character vector is a word that represents an instance of negative
sentiment.
search each tweet for words in the negative sentiment glossary. determine the total number of tweets that contain a negative sentiment. out of the total number of negative tweets, determine the total number of retweets.
% determine the negative tweets numnegtweets = 0; numnegrts = 0; for i = 1:numtweets % compare tweet to negative sentiment glossary djobs = contains(tweets.tweettexts{i},negkeywords,'ignorecase',true); if djobs % increase total count of tweets with negative sentiment by one numnegtweets = numnegtweets 1; % determine if negative tweet is a retweet rts = strncmp('rt @',tweets.tweettexts{i},4); if rts numnegrts = numnegrts 1; end end end
numnegtweets
contains the total number of tweets with
negative sentiment.
numnegrts
contains the total number of retweets with
negative sentiment.
display sentiment analysis results
create a table with columns that contain:
number of tweets
number of tweets with positive sentiment
number of positive retweets
number of tweets with negative sentiment
number of negative retweets
matlabtweettable = table(numtweets,numpostweets,numposrts,numnegtweets,numnegrts, ... 'variablenames',{'number_of_tweets','positive_tweets','positive_retweets', ... 'negative_tweets','negative_retweets'});
display the table of tweet data.
matlabtweettable
matlabtweettable = 1×5 table number_of_tweets positive_tweets positive_retweets negative_tweets negative_retweets ________________ _______________ _________________ _______________ _________________ 11465 688 238 201 96
out of 11,465 total tweets about the financial services industry in the last 7 days, 688 tweets have positive sentiment and 201 tweets have negative sentiment. out of the positive tweets, 238 tweets are retweets. out of the negative tweets, 96 are retweets.