0

Here's what I am trying to do (I would like an opinion whether it is possible to do it or not)

I have 8000 entries with addresses (many of which are repeated because the data contains crime data)

I would like to use geocode say '800 Beatty st' which repeats 300 times, the longitude and latitude output into a new column.

I know how to geocode 1 specific location but don't know how to make it output into a new column. Additionally, given the size of the data I can't geocode 1 location at a time.

x <-c("800 BEATTY ST, VANCOUVER BC","800 BEATTY ST, VANCOUVER BC",
      "800 BEATTY ST, VANCOUVER BC","2900 PRINCE EDWARD ST, VANCOUVER BC",
      "2900 PRINCE EDWARD ST, VANCOUVER BC","2900 PRINCE EDWARD ST, VANCOUVER BC",
      "3600 KINGSWAY AVE, VANCOUVER BC")

require(ggmap)

geocode('800 BEATTY ST, VANCOUVER BC') Information from URL : http://maps.googleapis.com/maps/api/geocode/json?address=800+BEATTY+ST,+VANCOUVER+BC&sensor=false Google Maps API Terms of Service : http://developers.google.com/maps/terms lon lat 1 -123.1139 49.27763

jazzurro
  • 23,179
  • 35
  • 66
  • 76
zazu
  • 344
  • 3
  • 9
  • 1
    Please supply [a reproducible example](stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). – Thomas Nov 15 '14 at 23:19

1 Answers1

0

I think you may be thinking a bit too hard here. If you have to repeat a same address multiple times, you can think how you wanna create a vector. For instance, you could do something like x <- rep(c("800 BEATTY ST, VANCOUVER BC", "2900 PRINCE EDWARD ST, VANCOUVER BC"), each = 5). If you run geocode(x) you see five entries for each address in the output.

x <- rep(c("800 BEATTY ST, VANCOUVER BC", "2900 PRINCE EDWARD ST, VANCOUVER BC",
           "3600 KINGSWAY AVE, VANCOUVER BC"), times = c(3, 3, 1))

library(ggmap)
foo <- geocode(x)
foo2 <- cbind(foo, x)

#        lon      lat                                   x
#1 -123.1139 49.27763         800 BEATTY ST, VANCOUVER BC
#2 -123.1139 49.27763         800 BEATTY ST, VANCOUVER BC
#3 -123.1139 49.27763         800 BEATTY ST, VANCOUVER BC
#4 -123.0963 49.25880 2900 PRINCE EDWARD ST, VANCOUVER BC
#5 -123.0963 49.25880 2900 PRINCE EDWARD ST, VANCOUVER BC
#6 -123.0963 49.25880 2900 PRINCE EDWARD ST, VANCOUVER BC
#7 -122.7899 49.26511     3600 KINGSWAY AVE, VANCOUVER BC
jazzurro
  • 23,179
  • 35
  • 66
  • 76
  • I wanted to try it, but my table is 8000 entries, is there a limit I can set under a geocode – zazu Nov 17 '14 at 05:41
  • @zazu What do you mean by saying a limit? – jazzurro Nov 17 '14 at 05:44
  • Also I tried this uni<-city[!duplicated(city)] which leaves me with 3800ish observations which is still too large to do GPS<-geocode(uni) What I am trying to say, my dataset contains addresses (8000ish) which have repeated entries. There are 3800ish unique entries – zazu Nov 17 '14 at 05:47
  • When I try to do a geocode it says Error: google restricts requests to 2500 requests a day. – zazu Nov 17 '14 at 05:47
  • @zazu Ah that 2500 limit. Given that, I think you want to do this task by dividing `uni` into two. Or you get 2500 for today and get the rest tomorrow. When you have to replicate some addresses, you have lon/lat in your hand. All you need is to find how many times you need to copy lon/lat. – jazzurro Nov 17 '14 at 05:51
  • If I choose to go the route of selecting unique addresses, how would I then repopulate it to the remaining rows that are duplicated For example: Row 1 and 3 are duplicates variable uni drops row 3 we geocode row 1 but now row 3 won't have the coordinates in it Is there a way to link the 2 together? Or should I just split my "city" variable in 4 chunks and geocode that way? – zazu Nov 17 '14 at 06:10
  • @zazu I am not sure how your data exactly look like. But, you can count how many times each address is duplicated. For example, `table(x)` will tell you how many times each address appears. You can keep the numeric in the table as a vector and use it in `times` above. It is a bit hard to explain by typing. But, I hope you get the idea. If this is too complicated, you can divide your data into four pieces. – jazzurro Nov 17 '14 at 06:25
  • Thank you, you are absolutely amazing! It relieved some of the headache I had. I will try to split the dataset into several pieces! – zazu Nov 17 '14 at 06:34
  • @zazu Let me know how it goes. If you need more support, drop me a line. – jazzurro Nov 17 '14 at 06:42
  • So I decided to stick to basics and did this df1<-city[1:2500] GPS1<-geocode(df1) df2<-city[2500:5000] GPS2<-geocode(df2) etc into 4 chunks and then GPS<-rbind(GPS1,GPS2,GPS3,GPS4) so now I have to figure out how to bind those columns to the main dataset – zazu Dec 04 '14 at 02:26
  • @zazu Good to hear that you made a progress. You already used `rbind`. I'm not sure how your main data look like. But I suppose you probably have to use `cbind`. – jazzurro Dec 04 '14 at 02:35