2

can you help me with extracting info from google API, when i supply address variable: 560066, 560065 (pincodes of bengaluru city), the address_components has different lengths (4 and 5 in this case)

This could return wrong data. Let say i wanted country info, following would return INDIA for former and 'error: out of bounds' for latter

Is there a universal way so that it returns country value for both case

library(rjson) # load rjson package

getCoordinates <- function(address) {
url <- paste("http://maps.googleapis.com/maps/api/geocode/json?address=",address,"&sensor=false",sep="")
map_data <- fromJSON(paste(readLines(url),collapse=""))
coord <- c(map_data$results[[1]]$geometry$location$lat,map_data$results[[1]]$geometry$location$lng, toupper(map_data$results[[1]]$address_components[[5]]$long_name))
return(coord)
}

g <- getCoordinates(560066)
Itachi
  • 2,817
  • 27
  • 35

1 Answers1

2

With the Google API you're not always guaranteed the same 'number' of results for each query, which is the issue you're describing.

What you need to extract is the types : country field, but to do this you need to know at what 'level' in the list the country field exists (if it does).

For this example I'm going to be using my googleway package to do the geocoding as it handles the construction of the API query for you

library(googleway)

## you need a valid Google API key to use their API
api_key <- "your_api_key"

query1 <- google_geocode(address = "560065", key = api_key)
query2 <- google_geocode(address = "560066", key = api_key)

## the 'country' is in the 'address_components : types' field
# query1$results$address_components

## use an 'lapply' to find the depth of the country field
l <- lapply(query2$results$address_components[[1]]$types, function(x){ 
    'country' %in% x
    })

## so now we know how far into the list we have to go
which(l == T)
# [1] 4

query2$results$address_components[[1]]$long_name[[which(l == T)]]
# [1] "India"

So wrapping this in a function:

getCountry <- function(g){

    l <- lapply(g[['results']][['address_components']][[1]][['types']], function(x){
        'country' %in% x
    })

    return(g[['results']][['address_components']][[1]][['long_name']][[which(l == T)]])
}

getCountry(query1)
# [1] "India"
getCountry(query2)
# [1] "India"

To incorporate this into your function you can do

getCoordinates <- function(address) {
    url <- paste("http://maps.googleapis.com/maps/api/geocode/json?address=",address,"&sensor=false",sep="")
    map_data <- fromJSON(paste(readLines(url),collapse=""))

    l <- lapply(map_data$results[[1]]$address_components, function(x){
        'country' %in% x[['types']]
    })

    coord <- c(map_data$results[[1]]$geometry$location$lat,map_data$results[[1]]$geometry$location$lng, 
            toupper(map_data$results[[1]]$address_components[[which(l == T)]]$long_name))
    return(coord)
}

getCoordinates(560065)
[1] "12.9698066" "77.7499632" "INDIA"

getCoordinates(560066)
[1] "13.0935798" "77.5778529" "INDIA"
SymbolixAU
  • 25,502
  • 4
  • 67
  • 139
  • Thanks, but is there a limit to it's use, After fixing this, i ran it on a list of 300+ city names, and it always stopped at 20 and threw an error, and now, even `g <- getCoordinates('AMSTERDAM')` won't work. Any idea? @SymbolixAU – Itachi Dec 08 '16 at 12:04
  • 1
    @NemishKanwar There is a limit of 2500 requests per day. As for your error, I can't help without seeing it - maybe it's specific to your 20th entry? Either way, if you have a separate question you should open up a new question, rather than updating an already answered one. – SymbolixAU Dec 08 '16 at 21:12