1

EDIT Thanks to @user5249203 for pointing out that geocoding is best done with ggmaps' geocode call. Watch out for NA's though.

I am struggling with the apply family in R.

I am using a function which takes in a string and returns longitude and latitude

> gGeoCode("Philadelphia, PA") [1] 39.95258 -75.16522

I have a simple dataframe which has the names of all 52 states:

dput(state_lat_long)
structure(
  list(State = structure(
    c(
      32L, 28L, 43L, 5L, 23L, 34L,
      30L, 13L, 14L, 38L, 22L, 25L, 15L, 20L, 24L, 40L, 46L, 21L, 9L,
      18L, 48L, 10L, 7L, 4L, 3L, 31L, 35L, 37L, 49L, 44L, 12L, 6L,
      17L, 36L, 11L, 39L, 42L, 8L, 47L, 33L, 16L, 1L, 29L, 27L, 26L,
      19L, 41L, 50L, 2L, 45L
    ), .Label = c(
      "alabama", "alaska", "arizona",
      "arkansas", "california", "colorado", "connecticut", "delaware",
      "florida", "georgia", "hawaii", "idaho", "illinois", "indiana",
      "iowa", "kansas", "kentucky", "louisiana", "maine", "maryland",
      "massachusetts", "michigan", "minnesota", "mississippi", "missouri",
      "montana", "nebraska", "nevada", "new hampshire", "new jersey",
      "new mexico", "new york", "north carolina", "north dakota", "ohio",
      "oklahoma", "oregon", "pennsylvania", "rhode island", "south carolina",
      "south dakota", "tennessee", "texas", "utah", "vermont", "virginia",
      "washington", "west virginia", "wisconsin", "wyoming"
    ), class = "factor"
  )), .Names = "State", row.names = c(NA,-50L), class = "data.frame"
)

To practice my apply skills, I simply want to apply gGeoCode to each cell in the only column of the state_lat_long dataframe.

Couldn't be much simpler.

Then what is the problem with this?

> View(apply(state_lat_long, function(x) gGeoCode(x)))

When I run this, I get:

Error in View : argument "FUN" is missing, with no default  

which I don't understand, because FUN is not missing.

So, let's try sapply. It's supposed to be simple, right?

But what is wrong with this?

View(sapply(state_lat_long$State, function(x) gGeoCode(x)))

When I run this, I get 2 rows with 50 columns, packed with NAs. I can't make sense of it.

Next, I tried

View(apply(state_lat_long, 2, function(x) gGeoCode(x)))  

and I got

     State
  40.71278
 -74.00594  

Again, this makes no sense!

What am I doing wrong? Thanks.

Community
  • 1
  • 1
Monica Heddneck
  • 2,973
  • 10
  • 55
  • 89
  • You need to input 3 arguments when you do apply. The first being your object (e.g. dataframe), second indicating whether to apply over rows or columns (you'll want 2 for columns), the third being FUN. In your code, the third argument was missing, so try View(apply(state_lat_long, 2, function(x) gGeoCode(x))) – hodgenovice Feb 05 '16 at 21:48
  • can you please check out my edit to the original question? – Monica Heddneck Feb 05 '16 at 21:52
  • Maybe I got mixed up and it should be View(apply(state_lat_long, 1, function(x) gGeoCode(x))) ? If not, it's probably less simple than I thought, and I'd need to see the code you used for gGeoCode to be of any help (which I may or may not be). – hodgenovice Feb 05 '16 at 22:02
  • break down the code and run it one at a time. `margin` = as `row` or `column` also matters, moreover it will be helpful to look at how you generated the `dataframe` ? How did you generate the data frame ? – user5249203 Feb 05 '16 at 22:12
  • And, its also helps to understand if you are passing the correct argumnets that the function `gGeoCode` is expecting. – user5249203 Feb 05 '16 at 22:38

2 Answers2

1

Is this how your data frame is ?

df = data.frame(State = c(
    32L, 28L, 43L, 5L, 23L, 34L,
    30L, 13L, 14L, 38L, 22L, 25L, 15L, 20L, 24L, 40L, 46L, 21L, 9L,
    18L, 48L, 10L, 7L, 4L, 3L, 31L, 35L, 37L, 49L, 44L, 12L, 6L,
    17L, 36L, 11L, 39L, 42L, 8L, 47L, 33L, 16L, 1L, 29L, 27L, 26L,
    19L, 41L, 50L, 2L, 45L
  ), Label = c(
    "alabama", "alaska", "arizona",
    "arkansas", "california", "colorado", "connecticut", "delaware",
    "florida", "georgia", "hawaii", "idaho", "illinois", "indiana",
    "iowa", "kansas", "kentucky", "louisiana", "maine", "maryland",
    "massachusetts", "michigan", "minnesota", "mississippi", "missouri",
    "montana", "nebraska", "nevada", "new hampshire", "new jersey",
    "new mexico", "new york", "north carolina", "north dakota", "ohio",
    "oklahoma", "oregon", "pennsylvania", "rhode island", "south carolina",
    "south dakota", "tennessee", "texas", "utah", "vermont", "virginia",
    "washington", "west virginia", "wisconsin", "wyoming"
  ))

head(df)
  State      Label
1    32    alabama
2    28     alaska
3    43    arizona
4     5   arkansas
5    23 california
6    34   colorado

apply(df, 1, function(x) gGeoCode(x))

Alternatively,

mapply(FUN = gGeoCode, df$Label, SIMPLIFY = T)

Note: Some states still throws NA. Re-run of the code fetches the missing coordinates. But, I expect it to work more efficiently if we know your input format/ dataframe construction. Also, it is important to make sure the arguments you pass are what the gGeoCode expects.

user5249203
  • 4,436
  • 1
  • 19
  • 45
  • So the reason the states throw `NA` is due to the function, not the apply. I see. – Monica Heddneck Feb 05 '16 at 23:36
  • I think you need to understand how the fuxntion works . when you pass name of the state separately ...it does give co ordinates. But the issue here is , the way you pass it or what you pass the function. Apply or mapply is helping you to get the function applied without a for loop. But, you need to know which co ordinates are the correct and which are wrong . – user5249203 Feb 06 '16 at 06:31
1

I realise this question was primarily about *apply, but, if you were only after geocodes an easier option would be to use a vectorised function, such as ggmap::geocode

state_lat_long <- structure(
    list(State = structure(
    c(
      32L, 28L, 43L, 5L, 23L, 34L,
      30L, 13L, 14L, 38L, 22L, 25L, 15L, 20L, 24L, 40L, 46L, 21L, 9L,
      18L, 48L, 10L, 7L, 4L, 3L, 31L, 35L, 37L, 49L, 44L, 12L, 6L,
      17L, 36L, 11L, 39L, 42L, 8L, 47L, 33L, 16L, 1L, 29L, 27L, 26L,
      19L, 41L, 50L, 2L, 45L
    ), .Label = c(
      "alabama", "alaska", "arizona",
      "arkansas", "california", "colorado", "connecticut", "delaware",
      "florida", "georgia", "hawaii", "idaho", "illinois", "indiana",
      "iowa", "kansas", "kentucky", "louisiana", "maine", "maryland",
      "massachusetts", "michigan", "minnesota", "mississippi", "missouri",
      "montana", "nebraska", "nevada", "new hampshire", "new jersey",
      "new mexico", "new york", "north carolina", "north dakota", "ohio",
      "oklahoma", "oregon", "pennsylvania", "rhode island", "south carolina",
      "south dakota", "tennessee", "texas", "utah", "vermont", "virginia",
      "washington", "west virginia", "wisconsin", "wyoming"
    ), class = "factor"
  )), .Names = "State", row.names = c(NA,-50L), class = "data.frame"
)

library(ggmap)

## to make sure we're using the correct geocode function I call it with 'ggmap::geocode'
ggmap::geocode(as.character(state_lat_long$State))
...
#           lon      lat
# 1   -74.00594 40.71278
# 2  -116.41939 38.80261
# 3   -99.90181 31.96860
# 4  -119.41793 36.77826
# 5   -94.68590 46.72955
# 6  -101.00201 47.55149
tospig
  • 7,762
  • 14
  • 40
  • 79