How to extract tweet geocode in twitteR package in R

Question

Recently Edwin Chen posted a great map of the regional usage of soda vs pop vs coke created from geocoded tweets inolving those words in the context of drinking. http://blog.echen.me/2012/07/06/soda-vs-pop-with-twitter/

He mentions that he used the twitteR package created by Jeff Gentry in R. Sure enough, it is easy to gather tweets that use a given word and put them in a dataframe:

require(twitteR)
require(plyr)
cat.tweets<-searchTwitter("cats",n=1000)
tweets.df = ldply(cat.tweets, function(t) t$toDataFrame() )

the dataframe (tweets.df) will contain the user id, tweet text, etc. for each tweet, but does not appear to contain the geocode. Any idea on how to get it in R?

You need to provide a `geocode` for `searchTwitter` to use. See the library documentation `?searchTwitter`. — mindless.panda, Jul 26 '12 at 17:46
I see that you can supply a geocode and radius into `searchTwitter` but that does not produce a geocode for each pulled tweet. — iantist, Jul 26 '12 at 18:19
but you would have the geocode that you supplied, right? with a smaller radii might that give you what you need? — mindless.panda, Jul 26 '12 at 18:25
Good idea, i see what you mean. I could iterate through essentilly a grid of points across a given map. Thanks for the suggestion. — iantist, Jul 26 '12 at 18:43
When you get it working you should answer your own question so others can see how you did it. I really like the post you linked to, but they didn't post any code. =( — mindless.panda, Jul 26 '12 at 19:01
I'll keep working on it and try to make a package, I'll certainly post the code as well. — iantist, Jul 27 '12 at 13:20

score 4 · Answer 1 · answered Nov 01 '14 at 14:48

4

Does geocode mean longitude and latitude coordinate? If yes, following commands works for me.

cat.tweets = searchTwitter("cats",n=1000)
tweets.df = do.call("rbind",lapply(cat.tweets,as.data.frame))

Source : LINK

answered Nov 01 '14 at 14:48

nurandi

1,588
1
11
20

score 3 · Answer 2 · answered Jul 30 '12 at 00:46

Ive been tinkering around with an R function, you enter in the search text, the number of search sites, and the radius around each site. For example twitterMap("#rstats",10,"10mi")here's the code:

twitterMap <- function(searchtext,locations,radius){
require(ggplot2)
require(maps)
require(twitteR)
#radius from randomly chosen location
radius=radius
lat<-runif(n=locations,min=24.446667, max=49.384472)
long<-runif(n=locations,min=-124.733056, max=-66.949778)
#generate data fram with random longitude, latitude and chosen radius
coordinates<-as.data.frame(cbind(lat,long,radius))
coordinates$lat<-lat
coordinates$long<-long
#create a string of the lat, long, and radius for entry into searchTwitter()
for(i in 1:length(coordinates$lat)){
coordinates$search.twitter.entry[i]<-toString(c(coordinates$lat[i],
coordinates$long[i],radius))
}
# take out spaces in the string
coordinates$search.twitter.entry<-gsub(" ","", coordinates$search.twitter.entry ,
fixed=TRUE)

#Search twitter at each location, check how many tweets and put into dataframe
for(i in 1:length(coordinates$lat)){
coordinates$number.of.tweets[i]<-
 length(searchTwitter(searchString=searchtext,n=1000,geocode=coordinates$search.twitter.entry[i]))
}
#making the US map
all_states <- map_data("state")
#plot all points on the map
p <- ggplot()
p <- p + geom_polygon( data=all_states, aes(x=long, y=lat, group = group),colour="grey",     fill=NA )

p<-p + geom_point( data=coordinates, aes(x=long, y=lat,color=number.of.tweets
                                     )) + scale_size(name="# of tweets")
p
}
# Example
searchTwitter("dolphin",15,"10mi")

example map

There are some big problems I've encountered that I'm not sure how to deal with. First, as written the code searches 15 different randomly generated locations, these locations are generated from a uniform distribution from the maximum longitude east in the US to the maximum west, and the latitude furthest north to the furthest south. This will include locations not in the united states, say just east of lake of the woods minnesota in Canada. I'd like a function that randomly checks to see if the generated location is in the US and discard it if it isn't. More importantly, I'd like to search thousands of locations, but twitter doesn't like that and gives me an 420 error enhance your calm. So perhaps it's best to search every few hours and slowly build a database and delete duplicate tweets. Finally, if one chooses a remotely popular topic, R gives an error like Error in function (type, msg, asError = TRUE) : transfer closed with 43756 bytes remaining to read. I'm a bit mystified at how to get around this problem.

please work on it... .and post when its figured out... even i need it — juggernauthk108, Aug 15 '16 at 19:12
can you tell me how to extract the longitude and latitude from the tweets that are harversted from `searchTwitter` then may be you can use [this](http://www.mapbox.com) — juggernauthk108, Aug 15 '16 at 19:23
I'm getting an error message: In doRppAPICall("search/tweets", n, params = params, retryOnRateLimit = retryOnRateLimit, : 15 tweets were requested but the API can only return 0 — Selrac, Feb 05 '17 at 14:08

score 2 · Answer 3 · answered Jul 26 '12 at 21:05

Here is a toy example, given that you can extract only 100 tweets per call:

require(twitteR)
require(plyr)
URL = paste('http://search.twitter.com/search.atom? 
      q=','&geocode=39.724089,-104.820557,3mi','&rpp=100&page=', page, sep='') #Aurora,CO with radii of 3mi
XML = htmlTreeParse(URL, useInternal=TRUE)
entry = getNodeSet(XML, "//entry")
tweets = c()

for (i in 1:99){ 
    t = unlist(xpathApply(entry[[i]], "//title", xmlValue))
    tweets = c(tweets,t)
}

This solution might not be too elegant, but I was able to get tweets given particular geocode.

How to extract tweet geocode in twitteR package in R

3 Answers3