0

I'm using tidycensus to pull dissertation data for three different years (decennial 2000, ACS 2009-2013, ACS 2015-2019) for all census tracts in the country.

Based on kyle walker's tutorial I've been able to use the map_df function to create the call below, which works. The result is a data frame that pulls data for all of the variables listed in the vector for every census tract in the country:

# get vector of state fips codes for US
us <- unique(fips_codes$state)[1:51]



# select my variables
my_vars19 <- c(pop = "B01003_001", 
               racetot = "B03002_001", 
               nhtot = "B03002_002", 
               nhwht = "B02001_002", 
               nhblk = "B02001_003", 
               nhnat = "B02001_004", 
               nhasian = "B02001_005", 
               nhpac = "B02001_006", 
               nhother = "B02001_007",
               nhtwo = "B02001_008", 
               hisp = "B03003_003",             
               male = "B01001_002",
               female = "B01001_026")



# function call to obtain tracts for US
acs2019 <- map_df(us, function(x) {
           get_acs(geography = "tract", 
                variables = my_vars19, 
                state = x)
})

glimpse(acs2019)

Rows: 949,728
Columns: 5
$ GEOID    <chr> "01001020100", "01001020100", "01001020100", "01001020100", "01001020100", "01001020100", "…
$ NAME     <chr> "Census Tract 201, Autauga County, Alabama", "Census Tract 201, Autauga County, Alabama", "…
$ variable <chr> "male", "female", "pop", "nhwht", "nhblk", "nhnat", "nhasian", "nhpac", "nhother", "nhtwo",…
$ estimate <dbl> 907, 1086, 1993, 1685, 152, 0, 2, 0, 0, 154, 1993, 1967, 26, 1058, 901, 1959, 759, 1117, 0,…
$ moe      <dbl> 118, 178, 225, 202, 78, 12, 5, 12, 12, 120, 225, 226, 36, 137, 133, 202, 113, 180, 12, 12, …

This is just a practice call though. I need to pull close to 150 to 200 variables for each year of analysis (so 2000, 2009-2013, and 2015-2019). I am worried that pulling so many variables for so many state and census tracts will be very taxing on the API. Also, I think there is a limit on the number of vars you can pull at once.

I could group calls by type of variables, but I worry breaking calls into groups could get unwieldy. And i'd also need to combine them together. I was wondering the standard practice was for creating a large dataset using tidycensus?

Do people usually break up calls or do they just call tables instead? Or is there a more efficient system than I've outlined. I know most people usually use tidycensus to pull a handful of vars, but what do they do when they need to pull a lot?

kaseyzapatka
  • 149
  • 2
  • 9
  • Can you include the code which works (for one variable or more) in the question itself and explain the expected output that you are looking for? – Ronak Shah Jan 29 '21 at 08:15
  • @RonakShah,thanks for the clarification. I've edited the original post to be clearer on what I'm asking. Basically what is the standard, best practice for pulling large datasets over many years with `tidycensus`? – kaseyzapatka Jan 29 '21 at 23:47
  • 1
    Hi @kaseyzapatka! Kyle here. My general recommendation (for anyone else reading this as well) is to use [NHGIS](https://www.nhgis.org/) for bulk Census data pulls. It is designed specifically for researchers who need to do that. Plus, this avoids API problems [like those you mentioned over on GitHub](https://github.com/walkerke/tidycensus/issues/343). – kwalkertcu Feb 01 '21 at 12:20
  • @kwalkertcu, Thanks. I've use social explorer in the past for big pulls but since I'm using `tidycensus` more I was hoping to automate the process a bit more. But it seems like the better option and you can usually save your past pulls. Thanks for the note. – kaseyzapatka Feb 01 '21 at 20:52

0 Answers0