25

I've tried running the code to geocode locations in R via Google Maps and the XML package from this blog post: http://www.r-chart.com/2010/07/maps-geocoding-and-r-user-conference.html

Here are his functions:

getDocNodeVal=function(doc, path){
  sapply(getNodeSet(doc, path), function(el) xmlValue(el))
}

gGeoCode=function(str){
  library(XML)
  u=paste('http://maps.google.com/maps/api/geocode/xml?sensor=false&address=',str)
  doc = xmlTreeParse(u, useInternal=TRUE)
  str=gsub(' ','%20',str)
  lng=getDocNodeVal(doc, "/GeocodeResponse/result/geometry/location/lat")
  lat=getDocNodeVal(doc, "/GeocodeResponse/result/geometry/location/lng")
  c(lat,lng)
}

When I run gGeoCode(), I get the following error:

> gGeoCode("Philadelphia, PA")
failed to load external entity "http%3A//maps.google.com/maps/api/geocode/xml%3Fsensor=false&address=%20Philadelphia,%20PA"
Error: 1: failed to load external entity "http%3A//maps.google.com/maps/api/geocode/xml%3Fsensor=false&address=%20Philadelphia,%20PA"

If I just paste into a browser the API url with Philadelphia, PA appended to the end, like the string passed to xmlParseTree, I get a result that looks like legitimate xml when I download it.

Is this an issue with the code, or have I failed to configure something or another?

JoFrhwld
  • 8,867
  • 4
  • 37
  • 32

6 Answers6

23

Have you thought about using the json call instead? Looking at your code, you could achieve the same doing this (you'll need to install packages RCurl and RJSONIO from omegahat.com).

Copy and paste this into R:

library(RCurl)
library(RJSONIO)

construct.geocode.url <- function(address, return.call = "json", sensor = "false") {
  root <- "http://maps.google.com/maps/api/geocode/"
  u <- paste(root, return.call, "?address=", address, "&sensor=", sensor, sep = "")
  return(URLencode(u))
}

gGeoCode <- function(address,verbose=FALSE) {
  if(verbose) cat(address,"\n")
  u <- construct.geocode.url(address)
  doc <- getURL(u)
  x <- fromJSON(doc,simplify = FALSE)
  if(x$status=="OK") {
    lat <- x$results[[1]]$geometry$location$lat
    lng <- x$results[[1]]$geometry$location$lng
    return(c(lat, lng))
  } else {
    return(c(NA,NA))
  }
}

Here's how you use the above functions:

x <- gGeoCode("Philadelphia, PA")

This is the result you get. I think in the original code, lat and lng are mixed up? But hopefully this is what you want:

> x
[1]  39.95233 -75.16379

Hope that helps a little mate,

Tony Breyal

Ari B. Friedman
  • 71,271
  • 35
  • 175
  • 235
Tony Breyal
  • 5,338
  • 3
  • 29
  • 49
  • @Tony Breyal: I tried this code on R 2.14.0. Unfortunately I get an error: `Error in x$results[[1]]$geometry$location$lat : $ operator is invalid for atomic vectors`. What am I doing wrong here? – radek Nov 05 '11 at 18:24
  • 1
    @radek The JSON code which is returned has changed a little bit by the looks of it. I've updated the code to reflect this by adding simplify=FALSE to fromJSON. Should work now as long as you have an up-to-date version of RJSONIO as the simplify parameter was only added a few versions ago. – Tony Breyal Nov 05 '11 at 23:55
  • @Tony Breyal: Updated version worked as a charm. Thanks a lot for help! – radek Nov 06 '11 at 14:14
  • @TonyBreyal Nice job! I added some features that make it work better in an `sapply` call when you need to geocode hundreds of addresses (it now returns `NA` rather than a vague error when Google can't geocode the address). Is your function in a package somewhere? – Ari B. Friedman May 28 '12 at 14:22
  • @gsk3 I've never written an R package and I only wrote the functions above specifically to answer this question. Glad to hear you've improved it, you should add it as an answer so others can also benefit :) – Tony Breyal Jun 04 '12 at 08:47
  • 1
    @TonyBreyal In the spirit of SO I edited your answer to add the refinements. If you would rather I revert it and add it as a separate answer, I could, but I'm fine with it this way (and you get the points for it ;-) ). If you want me to add `gGeoCode` to my miscellany package `taRifx` so it's more widely available, I could do so. You'd be listed as the author obviously. – Ari B. Friedman Jun 04 '12 at 12:18
  • @gsk3 That's brilliant, I didn't know answers could be edited in that way, fantastic stuff! I'm more than happy for you to do whatever you want with the function :) – Tony Breyal Jun 04 '12 at 15:21
  • 1
    @TonyBreyal I have added a version that also takes a vector of addresses as input. For now, I did not replace the previous version and instead placed my one at the end of the post. – user2503795 May 06 '13 at 06:28
  • @user1318686 I don't use this code myself but am happy to take your word for it that it is correct. However I can't "accept" your edit as someone has written on the review page that you should add your solution in a new post, which sounds good to me. Anyway, it looks like a cool update you've made, so well done :) – Tony Breyal May 06 '13 at 08:40
  • 1
    Okay, my modified version that also takes a vector of addresses as input is a reply below... – user2503795 May 06 '13 at 15:45
5

This code works using just the XML library

library(XML)
url = 'http://maps.googleapis.com/maps/api/geocode/xml?address=1600+Amphitheatre+Parkway,+Mountain+View,+CA&sensor=true'
doc = xmlTreeParse(url, useInternal=TRUE)
lat = as.numeric(xmlValue(getNodeSet(doc, '//location/lat')[[1]]))
lng = as.numeric(xmlValue(getNodeSet(doc, '//location/lng')[[1]]))
Dave
  • 51
  • 1
  • 1
3

This is another option for geocoding - it may be easier to parse:

https://webgis.usc.edu/Services/Geocode/Default.aspx

Greg
  • 11,564
  • 5
  • 41
  • 27
2

I have modified Tony Breyal solution so that the gGeoCode function also takes a vector of addresses as input. With this version, you can not only do gGeoCode("Philadelphia, PA") but also gGeoCode(c("Philadelphia, PA","New York, NY")) with this return value.

  address            lat          lng          
1 "Philadelphia, PA" "39.952335"  "-75.163789" 
2 "New York, NY"     "40.7143528" "-74.0059731"

Note that the google maps api has a daily limit of 2,500 so that your vector shouldn't be too long. Here is the updated function:

library(RCurl)
library(RJSONIO)

construct.geocode.url <- function(address, return.call = "json", sensor = "false") {
  root <- "http://maps.google.com/maps/api/geocode/"
  u <- paste(root, return.call, "?address=", address, "&sensor=", sensor, sep = "")
  return(URLencode(u))
}

gGeoCode <- function(address,verbose=FALSE) {
  require("plyr")
  if(verbose) cat(address,"\n")
  u <- aaply(address,1,construct.geocode.url)
  doc <- aaply(u,1,getURL)
  json <- alply(doc,1,fromJSON,simplify = FALSE)
  coord = laply(json,function(x) {
    if(x$status=="OK") {
      lat <- x$results[[1]]$geometry$location$lat
      lng <- x$results[[1]]$geometry$location$lng
      return(c(lat, lng))
    } else {
      return(c(NA,NA))
    }
  })
  if(length(address)>1) colnames(coord)=c("lat","lng")
  else names(coord)=c("lat","lng")
  return(data.frame(address,coord))
}

EDIT: Small correction in code so that lat and lng are returned as numerical values.

user2503795
  • 4,035
  • 2
  • 34
  • 49
2

I needed to get all the returned addresses from geocode not just the first one, so I wrote a small function to do so. It can be used to geocode and to reverse geocode

geocode <- function(address,reverse=FALSE)  {
  require("RJSONIO")
  baseURL <- "http://maps.google.com/maps/api/geocode/json?sensor=false&"

  # This is not necessary, 
  # because the parameter "address" accepts both formatted address and latlng

  conURL <- ifelse(reverse,paste0(baseURL,'latlng=',URLencode(address)),
                                  paste0(baseURL,'address=',URLencode(address)))  
  con <- url(conURL)  
  data.json <- fromJSON(paste(readLines(con), collapse=""))
  close(con) 
  status <- data.json["status"]
 if(toupper(status) == "OK"){
  t(sapply(data.json$results,function(x) {
      list(address=x$formatted_address,lat=x$geometry$location[1],
                                                 lng=x$geometry$location[2])}))
 } else { 
   warning(status)
   NULL 
 }
}

Geocode example:

geocode("Dupont Cir NW, Washington, DC 20036, USA")

     address                                                               lat      lng      
[1,] "Dupont Circle Northwest, Washington, DC 20036, USA"                  38.90914 -77.04366
[2,] "Dupont Circle, 1 Dupont Circle Northwest, Washington, DC 20036, USA" 38.90921 -77.04438
[3,] "Washington, DC 20036, USA"                                           38.90808 -77.04061
[4,] "Dupont Circle, Washington, DC 20036, USA"                            38.90958 -77.04344

Reverse Geocode example:

note that the address can be either formatted address or latlng, the reverse parameter is not used but it is including for future use with other geocoding services

geocode("38.910262, -77.043565")

     address                                                    lat      lng      
[1,] "40-58 Dupont Circle Northwest, Washington, DC 20036, USA" 38.91027 -77.04357
[2,] "Washington, DC 20036, USA"                                38.90808 -77.04061
[3,] "Dupont Circle, Washington, DC, USA"                       38.90969 -77.04334
[4,] "Northwest Washington, Washington, DC, USA"                38.94068 -77.06796
[5,] "District of Columbia, USA"                                38.90598 -77.03342
[6,] "Washington, DC, USA"                                      38.90723 -77.03646
[7,] "United States"                                            37.09024 -95.71289
iTech
  • 18,192
  • 4
  • 57
  • 80
0

This can also be done with my package googleway and a valid Google Maps API key

library(googleway)

key <- "your_api_key"

df <- google_geocode("Philadelphia, PA",
                      key = key)

df$results$geometry$location
#        lat       lng
# 1 39.95258 -75.16522

And to reverse-geocode

df <- google_reverse_geocode(location = c(39.95258, -75.16522),
                             key = key)

df$results$formatted_address
# [1] "1414 PA-611, Philadelphia, PA 19102, USA"           "15th St Station - MFL, Philadelphia, PA 19102, USA"
# [3] "Center City West, Philadelphia, PA, USA"            "Center City, Philadelphia, PA, USA"                
# [5] "Philadelphia, PA, USA"                              "Philadelphia, PA 19107, USA"                       
# [7] "Philadelphia County, PA, USA"                       "Philadelphia-Camden-Wilmington, PA-NJ-DE-MD, USA"  
# [9] "Philadelphia Metropolitan Area, USA"                "Pennsylvania, USA"                                 
# [11] "United States" 
SymbolixAU
  • 25,502
  • 4
  • 67
  • 139