2

I'm trying to reverse geocode with R. I first used ggmap but couldn't get it to work with my API key. Now I'm trying it with googleway.

newframe[,c("Front.lat","Front.long")]

  Front.lat Front.long
1 -37.82681   144.9592
2 -37.82681   145.9592

newframe$address <- apply(newframe, 1, function(x){
  google_reverse_geocode(location = as.numeric(c(x["Front.lat"], 
x["Front.long"])),
                         key = "xxxx")
})

This extracts the variables as a list but I can't figure out the structure.

I'm struggling to figure out how to extract the address components listed below as variables in newframe

postal_code, administrative_area_level_1, administrative_area_level_2, locality, route, street_number

I would prefer each address component as a separate variable.

M--
  • 25,431
  • 8
  • 61
  • 93
  • I am not sure why you need your API here. If you have the coordinates in a dataframe ggmap works fine. Clarify on that and I may post a googleway answer as well. – M-- Oct 01 '17 at 18:29
  • @Masoud - the *reason* for using an API key is because Google [says you need to use one](https://developers.google.com/maps/documentation/geocoding/intro) "To use the Google Maps Geocoding API, you need an API key.". But, it does work without it. – SymbolixAU Oct 01 '17 at 21:14
  • I need to use an API key because I want to run more than 2500 lat/lon in a day. – Jonathan Nolan Oct 02 '17 at 05:27
  • `register_google(key = "...")` for [tag:ggmap] – M-- Nov 15 '18 at 16:01

3 Answers3

2

Google's API returns the response in JSON. Which, when translated into R naturally forms nested lists. Internally in googleway this is done through jsonlite::fromJSON()

In googleway I've given you the choice of returning the raw JSON or a list, through using the simplify argument.

I've deliberately returned ALL the data from Google's response and left it up to the user to extract the elements they're interested in through usual list-subsetting operations.

Having said all that, in the development version of googleway I've written a few functions to help accessing elements of various API calls. Here are three of them that may be useful to you

## Install the development version
# devtools::install_github("SymbolixAU/googleway")

res <- google_reverse_geocode(
  location = c(df[1, 'Front.lat'], df[1, 'Front.long']), 
  key = apiKey
  )

geocode_address(res)
# [1] "45 Clarke St, Southbank VIC 3006, Australia"                    
# [2] "Bank Apartments, 275-283 City Rd, Southbank VIC 3006, Australia"
# [3] "Southbank VIC 3006, Australia"                                  
# [4] "Melbourne VIC, Australia"                                       
# [5] "South Wharf VIC 3006, Australia"                                
# [6] "Melbourne, VIC, Australia"                                      
# [7] "CBD & South Melbourne, VIC, Australia"                          
# [8] "Melbourne Metropolitan Area, VIC, Australia"                    
# [9] "Victoria, Australia"                                            
# [10] "Australia"

geocode_address_components(res)
#        long_name short_name                                  types
# 1             45         45                          street_number
# 2  Clarke Street  Clarke St                                  route
# 3      Southbank  Southbank                    locality, political
# 4 Melbourne City  Melbourne administrative_area_level_2, political
# 5       Victoria        VIC administrative_area_level_1, political
# 6      Australia         AU                     country, political
# 7           3006       3006                            postal_code

geocode_type(res)
# [[1]]
# [1] "street_address"
# 
# [[2]]
# [1] "establishment"      "general_contractor" "point_of_interest" 
# 
# [[3]]
# [1] "locality"  "political"
# 
# [[4]]
# [1] "colloquial_area" "locality"        "political"  
SymbolixAU
  • 25,502
  • 4
  • 67
  • 139
  • Ok. This makes sense. Unfortunately I couldn't get the dev version of googleway to install on my machine but I've looked through the list format you suggested and am getting closer to a solution. The current script is: newframe$addresstable<-apply(newframe, 1, function(x) as.data.frame(newframe$address$`1`$results$address_components[[1]] [[1]])) But that delivers the value for the first row in every row which isn't quite right. – Jonathan Nolan Oct 02 '17 at 05:07
  • I think the main issue is that I struggle to apply the lapply function to each column of the main data frame, but then inside that lapply function refer to a subsetted list within each column. – Jonathan Nolan Oct 02 '17 at 05:36
  • @JonathanNolan I think you'll also have issues because the response from Google is inconsistent; you won't always get the same number of 'rows' of address components or types, so it's difficult to un-list everything into a data.frame – SymbolixAU Oct 02 '17 at 21:57
  • My plan was once I could figure out how to make a colum containing nested datasets that look very similar to $address$'4342'$results$address_components[[1]], I could then convert rows to colums and then match rows and colums that way. The issue is because each list contains the number of the parent row in it (in this case 4342) I'm finding it impossible to reference it with the apply function. i tried using paste0(rownames()) but that doesn't seem to be working. – Jonathan Nolan Oct 03 '17 at 07:05
1

After reverse geocoding into newframe$address the address components could be extracted further as follows:

# Make a boolean array of the valid ("OK" status) responses (other statuses may be "NO_RESULTS", "REQUEST_DENIED" etc).
sel <- sapply(c(1: nrow(newframe)), function(x){
  newframe$address[[x]]$status == 'OK'
})

# Get the address_components of the first result (i.e. best match) returned per geocoded coordinate.
address.components <- sapply(c(1: nrow(newframe[sel,])), function(x){
  newframe$address[[x]]$results[1,]$address_components
})

# Get all possible component types.
all.types <- unique(unlist(sapply(c(1: length(address.components)), function(x){
  unlist(lapply(address.components[[x]]$types, function(l) l[[1]]))
})))

# Get "long_name" values of the address_components for each type present (the other option is "short_name").
all.values <- lapply(c(1: length(address.components)), function(x){
  types <- unlist(lapply(address.components[[x]]$types, function(l) l[[1]]))
  matches <- match(all.types, types)
  values <- address.components[[x]]$long_name[matches]
})

# Bind results into a dataframe.
all.values <- do.call("rbind", all.values)
all.values <- as.data.frame(all.values)
names(all.values) <- all.types

# Add columns and update original data frame.
newframe[, all.types] <- NA
newframe[sel,][, all.types] <- all.values

Note that I've only kept the first type given per component, effectively skipping the "political" type as it appears in multiple components and is likely superfluous e.g. "administrative_area_level_1, political".

ssast
  • 779
  • 1
  • 8
  • 17
  • That worked thanks! For some reason the last bit binded with numbers so i wrote data.entry.passingdistances2<-do.call(cbind, list(Data.entry.passingdistances, all.values)) instead but otherwise perfect! – Jonathan Nolan Oct 04 '17 at 02:00
0

You can use ggmap:revgeocode easily; look below:

library(ggmap)
df <- cbind(df,do.call(rbind,
        lapply(1:nrow(df),
          function(i) 
            revgeocode(as.numeric(
              df[i,2:1]), output = "more")      
                [c("administrative_area_level_1","locality","postal_code","address")])))

#output:
df
#   Front.lat Front.long administrative_area_level_1  locality
#   1 -37.82681   144.9592                    Victoria Southbank
#   2 -37.82681   145.9592                    Victoria    Noojee
#     postal_code                                     address
#   1        3006 45 Clarke St, Southbank VIC 3006, Australia
#   2        3833 Cec Dunns Track, Noojee VIC 3833, Australia

You can add "route" and "street_number" to the variables that you want to extract but as you can see the second address does not have street number and that will cause an error.

Note: You may also use sub and extract the information from the address.

Data:

df <- structure(list(Front.lat = c(-37.82681, -37.82681), Front.long = 
      c(144.9592, 145.9592)), .Names = c("Front.lat", "Front.long"), class = "data.frame", 
      row.names = c(NA, -2L))
M--
  • 25,431
  • 8
  • 61
  • 93
  • Yes, I had originally gone down that route but have given up because I could not figure out how to add the client and key arguments to the revgeocode function so that I could run more than 2500 per day. – Jonathan Nolan Oct 02 '17 at 06:18