Analyzing Twitter data using R -
i trying analyze twitter data using r, plotting number of tweets on period of time, when write
plot(tweet_df$created_at, tweet_df$text)
i got error message:
error in plot.window(...) : need finite 'xlim' values in addition: warning messages: 1: in xy.coords(x, y, xlabel, ylabel, log) : nas introduced coercion 2: in xy.coords(x, y, xlabel, ylabel, log) : nas introduced coercion 3: in min(x) : no non-missing arguments min; returning inf 4: in max(x) : no non-missing arguments max; returning -inf 5: in min(x) : no non-missing arguments min; returning inf 6: in max(x) : no non-missing arguments max; returning -inf
here code used:
library("rjson") json_file <- "tweet.json" json_data <- fromjson(file=json_file) library("streamr") tweet_df <- parsetweets(tweets=file) #using twitter data frame tweet_df$created_at tweet_df$text plot(tweet_df$created_at, tweet_df$text)
you've got couple issues here, nothing insurmountable. if want track tweets on time, you're asking tweets created per x time frame (tweets per minute, second, whatever). means need created_at
column, , can build graph r's hist
function.
if want split words mentioned in text or whatever, that's doable should use ggplot2
, maybe ask different question. anyways looks parsetweets
converts twitters timestamps character field, you'll want turn posixct
timestamp field r can understand. assuming have data frame looks this:
❥ head(tweet_df[,c("id_str","created_at")]) id_str created_at 1 597862782101561346 mon may 11 20:36:09 +0000 2015 2 597862782097346560 mon may 11 20:36:09 +0000 2015 3 597862782105694208 mon may 11 20:36:09 +0000 2015 4 597862782105694210 mon may 11 20:36:09 +0000 2015 5 597862782076198912 mon may 11 20:36:09 +0000 2015 6 597862782114078720 mon may 11 20:36:09 +0000 2015
you can this:
❥ dated_tweets <- as.posixct(tweet_df$created_at, format = "%a %b %d %h:%m:%s +0000 %y")
that give vector of dated tweets in r's timestamp format. can plot them this. left open sample twitter feed 15 mins or so. result:
❥ hist(dated_tweets, breaks ="secs", freq = true)
Comments
Post a Comment