1

I have a dataset that has information related to properties in different cities and states. It currently looks something like this:

Id| StreetName               | City         | State
=================================================
1 | 100 Railroad Ave         | Elkton       | MD
2 | 414 Grant St             | Pittsburgh   | PA
3 | 1400 John F Kennedy Blvd | Philadelphia | PA
4 | 800 N French St          | Wilmington   | DE

Is there a package in r that I can use to get the zip code per row? I want to create a density map based on the zip codes. My current data includes thousands of rows.

  • 1
    Perhaps relevant: https://stackoverflow.com/questions/45723974/extracting-city-and-state-information-from-a-google-street-address – Jon Spring Jan 03 '20 at 19:09
  • You might consider using census tracts. Zip Codes can be very strange – SmokeyShakers Jan 03 '20 at 19:21
  • You can geocode the addresses, which will likely give you zip codes for each address. Or if you want something less USPS-based and more demographics-based, you could convert that to spatial data and do some type of joining operation with e.g. counties, census tracts, towns, etc. But questions just asking for libraries and tools are off topic for SO; try to make a full [mcve] – camille Jan 03 '20 at 19:33
  • You could look into using the USPS API (https://www.usps.com/business/web-tools-apis/) – BigFinger Jan 03 '20 at 23:07

1 Answers1

3

It is possible, but requires a (free and easy) account from the US Postal Service.

Using the rusps package (more info: https://github.com/hansthompson/rusps)

The output is a little repetitive, and the function doesn't seem to be pipe-able, but it worked well on the data you posted. It shouldn't be too hard to clean up % join to your data (or use on it's own).

addresses <- tibble::tribble(
  ~Id,                ~StreetName,          ~City, ~State,
    1,         "100 Railroad Ave",       "Elkton",   "MD",
    2,             "414 Grant St",   "Pittsburgh",   "PA",
    3, "1400 John F Kennedy Blvd", "Philadelphia",   "PA",
    4,          "800 N French St",   "Wilmington",   "DE"
  )

# devtools::install_github('hansthompson/rusps')
library(rusps)
library(XML)
username <- 'XXXYYYYYZZZZ' # get this quickly and freely 
                           #by signing up at https://registration.shippingapis.com/ (not commercial).

output <- validate_address_usps(street = addresses$StreetName, 
                                city = addresses$City, 
                                state = addresses$State, 
                                username = username)

output

                   Address2         City State  Zip5 Zip4 .attrs
1          100 RAILROAD AVE       ELKTON    MD 21921 5537      0
2              414 GRANT ST   PITTSBURGH    PA 15219 2409      1
3  1400 JOHN F KENNEDY BLVD PHILADELPHIA    PA 19107 3200      2
4           800 N FRENCH ST   WILMINGTON    DE 19801 3594      3
5                        NA           NA    NA    NA   NA      0
6          100 RAILROAD AVE       ELKTON    MD 21921 5537      0
7              414 GRANT ST   PITTSBURGH    PA 15219 2409      0
8  1400 JOHN F KENNEDY BLVD PHILADELPHIA    PA 19107 3200      0
9          100 RAILROAD AVE       ELKTON    MD 21921 5537      1
10             414 GRANT ST   PITTSBURGH    PA 15219 2409      1
11         100 RAILROAD AVE       ELKTON    MD 21921 5537      0
12             414 GRANT ST   PITTSBURGH    PA 15219 2409      0
13 1400 JOHN F KENNEDY BLVD PHILADELPHIA    PA 19107 3200      0
14          800 N FRENCH ST   WILMINGTON    DE 19801 3594      0

All of the columns returned are list-columns for some reason. To clean it up:

unnest(output, cols = everything()) %>% 
    group_by(Address2) %>% 
    slice(1)

# A tibble: 5 x 6
# Groups:   Address2 [5]
  Address2                 City         State Zip5  Zip4  .attrs
  <chr>                    <chr>        <chr> <chr> <chr> <chr> 
1 100 RAILROAD AVE         ELKTON       MD    21921 5537  0     
2 1400 JOHN F KENNEDY BLVD PHILADELPHIA PA    19107 3200  2     
3 414 GRANT ST             PITTSBURGH   PA    15219 2409  1     
4 800 N FRENCH ST          WILMINGTON   DE    19801 3594  3     
5 NA                       NA           NA    NA    NA    0     
mrhellmann
  • 5,069
  • 11
  • 38