-2

Following script helps to query Census Bureau information with regard to places and their regional areas. It makes use of the acs package. Problem: I would like to know how the following script can be adjusted so that it applies the output to all input cities.

Script

dput(data)

dat <- c("Albuquerque, NM", "Alpine, UT", "Anacortes, WA", "Anchorage, AK", "Ann Arbor, MI", "Arlington, MA", "Arlington, VA", "Artesia, CA", "Asheville, NC", "Astoria, NY", "Athens, GA", "Atlanta, GA", "Austin, TX", "Baltimore, MD", "Bellevue, WA", "Sunnyvale, CA")


# Load packages
library(tigris)   # County information
data(fips_codes)
library(acs)      # Census query

# Separate place and state names (needed for queries below)
dat <- data.frame(dat)
dat <- dat %>% separate(dat, c("place", "state_name"), ",")

# Get state names and abbreviations
states <- cbind(state.name, state.abb) %>% tbl_df()

# Script for a single query:

fips_codes <-fips_codes[c("state","state_code","county_code","county")]
colnames(fips_codes) = c("state.abb", "statefp", "countyfp", "county.name")
# Query county FIPS codes, join tables
output <- geo.lookup(state = "GA", place = "Athens")[2,] %>%
  tbl_df() %>%
  left_join(states, by = "state.name") %>%
  left_join(fips_codes, by = c("county.name", "state.abb"))

output

# A tibble: 1 x 8
  state state.name   county.name place                                        place.name state.abb statefp countyfp
  <chr>      <chr>         <chr> <int>                                             <chr>     <chr>   <chr>    <chr>
1    13    Georgia Clarke County  3440 Athens-Clarke County unified government (balance)        GA      13      059

As you can see, the script gives an output for a single entry, i.e. geo.lookup(state = "GA", place = "Athens").

Now, how can I change the script so that it loops over all dat items and creates a data frame containing all the place, state.abbrev, state, county, countyfp etc. in a row? dat is already separated into place and state abbreviation.

Bonus: It would be great to see whether the acs package can also help to get the place/county related msa information.

Thanks!

Community
  • 1
  • 1
Christopher
  • 2,120
  • 7
  • 31
  • 58

1 Answers1

1

You can execute the geo.lookup() request in an apply() function as follows:

dat <- strsplit(c("Athens, GA", "Albuquerque, NM", "Alpine, UT", "Anacortes, WA"),",")

# Load packages
library(tigris)   # County information
data(fips_codes)
data(fips.state)
library(acs)  

theGeoLookups <- lapply(dat,function(x) {
    geo.lookup(state = trimws(x[2]), place = trimws(x[1]))[2,]
    # only return if we receive valid place data from geo.lookup()
    if("place.name" %in% colnames(aLookup)) return(aLookup)
    else return(NULL) 

})
aResult <- do.call(rbind,theGeoLookups) 

The output is a single data frame with the content from geo.lookup().

> thePlaces
   state state.name       county.name place                                        place.name
2     13    Georgia     Clarke County  3440 Athens-Clarke County unified government (balance)
21    35 New Mexico Bernalillo County  2000                                  Albuquerque city
22    49       Utah       Utah County   540                                       Alpine city
23    53 Washington     Skagit County  1990                                    Anacortes city

UPDATE (02Dec2017): As I did some additional testing on the complete list of cities provided by @Christopher, I noticed that sometimes geo.lookup() fails to return valid data at the place level, so the output only has 2 columns: state and state.name.

> # failure case: Astoria, NY
> 
> geo.lookup(place="Astoria",state="NY")
  state state.name
1    36   New York
> 

In this situation, do.call(rbind,theGeoLookups) fails because the all data frames do not have the same columns. This is easily mitigated with some additional logic within the anonymous function in lapply(), which I added in the original code block.

Len Greski
  • 10,505
  • 2
  • 22
  • 33
  • Before I mark as solved: For some reason, the county FIPS is missing? And the city name of the input dat should be at front.Tried rbind(dat,theGeoLookups) but didn't work. – Christopher Nov 28 '17 at 07:35
  • The output fields should be called "statefp, countyfp, classfp". – Christopher Nov 28 '17 at 07:43