0

I'm new to both this forum and R. I'm working on a ecological study and using census data to develop a dataset in Washington State divided by zip code with the following variables: "total_pop", "median age", "median age men", "median age women", "total pop men", "total pop women", "white", "black", "american indian/alskan", "asian", "native hawaiian", "other race", "2+ races". I used a package I found online called tidycensus to get data from the ACS 5-year estimate dataset and I ran into a couple of issues which I was hoping that you could help me with.

My main issue is that the data set isn't configured to my liking. I had envisioned that the data set would have an ouput with zip codes being the rows and the columns being the variables such that each zip code will have the 14 associated variables. For example, for a given zip code (12345) in the year 2018 I would like the (top) configuration rather than the bottom configuration which is what I am currently getting. desired configuration

This is an example of the current data that I captured. [current dataset]2

Another issue I've been having is that the get_acs function in the package can't give me zip-codes in a particular state but the entire US so my data set currently has a majority of zip-codes that I don't need. If I was to find all the zipcodes in the Washington state is there a way to only include those zipcodes specific to Washington State. Thank you all for the help in advance, I'd like to reiterate that I am most certainly a novice in R so any/all help would be greatly appreciated.

stefan
  • 90,330
  • 6
  • 25
  • 51
  • Please add data using `dput` or something that we can copy and use. Also show expected output for the data shared. Read about [how to ask a good question](http://stackoverflow.com/help/how-to-ask) and [how to give a reproducible example](http://stackoverflow.com/questions/5963269). – Ronak Shah Oct 27 '20 at 01:45
  • 1
    If you use the argument `output = "wide"` in `get_acs()`, tidycensus will give you back the data in wide format, as you want. – kwalkertcu Oct 28 '20 at 12:42

1 Answers1

0

Your main issue is called going from a long dataset to a wide dataset.

First, you can rename your variables directly in the get_acs() function since going off the variable code is probably hard.

country <- get_acs(geography = "zcta", 
                   # Include and rename desired variables here
                   variables = c(totPop = "B01003_001",
                                 medAge = "B01002_001"), 
                   year = 2018)

Then I subsetted the data to only include WA zip codes (98001-99403). I also removed the margin of error column since it made the spread() function weird. Hopefully, you don't need it.

wa <- country %>%
  filter(GEOID >= 98001 & GEOID <= 99403) %>%
  select(-moe)

Finally, this step is converting from long to wide dataset using the spread() function from the tidyr package.

wide_wa <- wa %>%
  spread(variable, estimate)

Hopefully, you get something like this.

tonybot
  • 643
  • 2
  • 10
  • Thank you so much for the help. I actually realized that I needed to use the spread command to change from long--> wide but there was an issue in completing the function due to the size of the dataset before excluding the zipcodes (which you've just shown me). It looks exactly how I wanted, thank you kind stranger. – Zora Singh Oct 27 '20 at 00:35
  • @ZoraSingh If this solution answers your question, please accept it as the answer (and upvote it). – G5W Dec 17 '20 at 01:16