11

Recently Edwin Chen posted a great map of the regional usage of soda vs pop vs coke created from geocoded tweets inolving those words in the context of drinking. http://blog.echen.me/2012/07/06/soda-vs-pop-with-twitter/

He mentions that he used the twitteR package created by Jeff Gentry in R. Sure enough, it is easy to gather tweets that use a given word and put them in a dataframe:

require(twitteR)
require(plyr)
cat.tweets<-searchTwitter("cats",n=1000)
tweets.df = ldply(cat.tweets, function(t) t$toDataFrame() ) 

the dataframe (tweets.df) will contain the user id, tweet text, etc. for each tweet, but does not appear to contain the geocode. Any idea on how to get it in R?

iantist
  • 833
  • 2
  • 12
  • 27
  • You need to provide a `geocode` for `searchTwitter` to use. See the library documentation `?searchTwitter`. – mindless.panda Jul 26 '12 at 17:46
  • 1
    I see that you can supply a geocode and radius into `searchTwitter` but that does not produce a geocode for each pulled tweet. – iantist Jul 26 '12 at 18:19
  • but you would have the geocode that you supplied, right? with a smaller radii might that give you what you need? – mindless.panda Jul 26 '12 at 18:25
  • Good idea, i see what you mean. I could iterate through essentilly a grid of points across a given map. Thanks for the suggestion. – iantist Jul 26 '12 at 18:43
  • 1
    When you get it working you should answer your own question so others can see how you did it. I really like the post you linked to, but they didn't post any code. =( – mindless.panda Jul 26 '12 at 19:01
  • I'll keep working on it and try to make a package, I'll certainly post the code as well. – iantist Jul 27 '12 at 13:20

3 Answers3

4

Does geocode mean longitude and latitude coordinate? If yes, following commands works for me.

cat.tweets = searchTwitter("cats",n=1000)
tweets.df = do.call("rbind",lapply(cat.tweets,as.data.frame))

Source : LINK

nurandi
  • 1,588
  • 1
  • 11
  • 20
3

Ive been tinkering around with an R function, you enter in the search text, the number of search sites, and the radius around each site. For example twitterMap("#rstats",10,"10mi")here's the code:

twitterMap <- function(searchtext,locations,radius){
require(ggplot2)
require(maps)
require(twitteR)
#radius from randomly chosen location
radius=radius
lat<-runif(n=locations,min=24.446667, max=49.384472)
long<-runif(n=locations,min=-124.733056, max=-66.949778)
#generate data fram with random longitude, latitude and chosen radius
coordinates<-as.data.frame(cbind(lat,long,radius))
coordinates$lat<-lat
coordinates$long<-long
#create a string of the lat, long, and radius for entry into searchTwitter()
for(i in 1:length(coordinates$lat)){
coordinates$search.twitter.entry[i]<-toString(c(coordinates$lat[i],
coordinates$long[i],radius))
}
# take out spaces in the string
coordinates$search.twitter.entry<-gsub(" ","", coordinates$search.twitter.entry ,
fixed=TRUE)

#Search twitter at each location, check how many tweets and put into dataframe
for(i in 1:length(coordinates$lat)){
coordinates$number.of.tweets[i]<-
 length(searchTwitter(searchString=searchtext,n=1000,geocode=coordinates$search.twitter.entry[i]))
}
#making the US map
all_states <- map_data("state")
#plot all points on the map
p <- ggplot()
p <- p + geom_polygon( data=all_states, aes(x=long, y=lat, group = group),colour="grey",     fill=NA )

p<-p + geom_point( data=coordinates, aes(x=long, y=lat,color=number.of.tweets
                                     )) + scale_size(name="# of tweets")
p
}
# Example
searchTwitter("dolphin",15,"10mi")

example map

There are some big problems I've encountered that I'm not sure how to deal with. First, as written the code searches 15 different randomly generated locations, these locations are generated from a uniform distribution from the maximum longitude east in the US to the maximum west, and the latitude furthest north to the furthest south. This will include locations not in the united states, say just east of lake of the woods minnesota in Canada. I'd like a function that randomly checks to see if the generated location is in the US and discard it if it isn't. More importantly, I'd like to search thousands of locations, but twitter doesn't like that and gives me an 420 error enhance your calm. So perhaps it's best to search every few hours and slowly build a database and delete duplicate tweets. Finally, if one chooses a remotely popular topic, R gives an error like Error in function (type, msg, asError = TRUE) : transfer closed with 43756 bytes remaining to read. I'm a bit mystified at how to get around this problem.

iantist
  • 833
  • 2
  • 12
  • 27
  • please work on it... .and post when its figured out... even i need it – juggernauthk108 Aug 15 '16 at 19:12
  • can you tell me how to extract the longitude and latitude from the tweets that are harversted from `searchTwitter` then may be you can use [this](http://www.mapbox.com) – juggernauthk108 Aug 15 '16 at 19:23
  • I'm getting an error message: In doRppAPICall("search/tweets", n, params = params, retryOnRateLimit = retryOnRateLimit, : 15 tweets were requested but the API can only return 0 – Selrac Feb 05 '17 at 14:08
2

Here is a toy example, given that you can extract only 100 tweets per call:

require(twitteR)
require(plyr)
URL = paste('http://search.twitter.com/search.atom? 
      q=','&geocode=39.724089,-104.820557,3mi','&rpp=100&page=', page, sep='') #Aurora,CO with radii of 3mi
XML = htmlTreeParse(URL, useInternal=TRUE)
entry = getNodeSet(XML, "//entry")
tweets = c()

for (i in 1:99){ 
    t = unlist(xpathApply(entry[[i]], "//title", xmlValue))
    tweets = c(tweets,t)
}

This solution might not be too elegant, but I was able to get tweets given particular geocode.

notrockstar
  • 833
  • 3
  • 15
  • 28