0

When working with Zip Codes, often times states with preceding "0"'s will get dropped i.e. Massachusetts addresses with a "02111" in a csv file will be imported and truncated to "2111".

Q1: What is the correct column header for the read.csv to keep ZIP codes intact?

Q2: What is the proper way to export dataframes and keep ZIP codes intact?

Source DF: FDIC

library(readr)
library(dplyr)
library(tidyr)

FDIC_1_source <- read_csv("OFFICES2_ALL.CSV")

NEState<-c("CT", "DC", "DE", "MA", "MD", "ME", "NH", "NJ", "NY", "PA", "RI", "VA", "VT")

FDIC_2_filtered<-FDIC_1_source[FDIC_1_source$STALP %in% NEState,]

write.csv(FDIC_2_filtered,file="FDIC_2_filtered.CSV")

The closest issue I found on SO is With Dates and Numbers however I can't think of a way to use the documentation for Zip codes.

Bluebird
  • 531
  • 1
  • 5
  • 18
  • Are the leading zeroes actually missing in the CSV file, or do they disappear when you open the CSV file in, oh, *Excel* to note a particular bad actor? – rd_nielsen Jun 27 '17 at 18:31
  • Excel always drops the "0" when importing. Although the column can be reclassified as a "ZIP", this won't be saved when well... saving as a csv. Although the reclassification will keep when saving as an excel file. CSV is ideal working on different software platforms. Hence the need to use R to automatically classify a particular column as a "ZIP" and exporting as a "ZIP" as a CSV. – Bluebird Jun 27 '17 at 18:33
  • If the zeros are dropped in the dataframe after reading the csv, then try specifying the column types when reading the csv IE `read_csv("OFFICES2_ALL.CSV", col_types = "character")` – Matt Jewett Jun 27 '17 at 18:38
  • @MattJewett That's a fix, but only if the CSV has the preceding 0. Best way to describe it is that for the source file you'll have Zip codes in the CSV. I would like to import to R and have R recognize that the ZIP column and automatically transform addresses, i.e. for the state Massachusetts zip code "2111" -> "02111". – Bluebird Jun 27 '17 at 18:43

2 Answers2

1

If the leading zeros are missing in the source file. Give this a try, since I do not know the column name for the zip codes, in this example I will just use zip.col as the column name.

FDIC_1_source <- read_csv("OFFICES2_ALL.CSV")

FDIC_1_source$zip.col <- sapply(FDIC_1_source$zip.col, function(x) paste0(paste0(rep("0",5 - nchar(x)), collapse = ""), x))
Matt Jewett
  • 3,249
  • 1
  • 14
  • 21
0

Issue can be resolved with the Zipcode Package. Documentation can be found Here. Will update with code to reflect newfound library.

library(readr)
library(dplyr)
library(tidyr)
library(zipcode)

FDIC_1_source <- read_csv("OFFICES2_ALL.CSV")

NEState<-c("CT", "DC", "DE", "MA", "MD", "ME", "NH", "NJ", "NY", "PA", "RI", "VA", "VT")

FDIC_2_filtered<-FDIC_1_source[FDIC_1_source$STALP %in% NEState,]

FDIC_2_filtered$ZIP = clean.zipcodes(FDIC_2_filtered$ZIP) #this will fix any zip code errors, esecially working with North East States

write.csv(FDIC_2_filtered,file="FDIC_2_filtered.CSV")

Note: Excel is still a pain when reading the subsequent csv file. It will still truncate the preceding "0" but for all other applications, the library will automatically fix the zipcode issue and export just fine. Please thank Jeffrey Breen, the author of the package, for writing an easy to use and elegant solution.

Bluebird
  • 531
  • 1
  • 5
  • 18