I have many years of data to read from .txt (tab delimited) to data.frame or data.table formats to work in R. For each year, quarterly files need to be appended. My searching has resulted in some nice code to find all quarterly files and, using FREAD and BIND_ROWS, create 1 annual file. @Maiasaura
One oddity I've found - using FREAD instead of READ.TABLE leads to different classes for some vectors. The pat_age is to be alphanumeric, "00", "01", "02". READ.TABLE seems to handle this as expected - FREAD creates an integer. Thus I've added colClasses to control PAT_AGE class.
Unfortunately - column names across the quarterly files are sometimes Upper Case - others are Lower Case (PAT_AGE pat_age). Any way to control that as I read in the .txt files? ColClasses with tolower didn't work for me.
tabtest <- list.files( pattern= ".*PUDF.*base.*tab.*" , full.names = TRUE)
%>% lapply( fread, header=TRUE, colClasses=c(pat_age="character")) %>%
dplyr::bind_rows()
I expect messy data - and may need to adjust other column names and classes as I move from year to year.
NOTE: Am I correct that if I can't change case within the lapply statement - I'd need to do it to the .txt files? The colClasses function requires "pat_age" to be lower cased across all files.
NOTE: Came across this question:
fread (data.table) select columns, throw error if column not found
Could it be modified to read and modify the header - and then read the entire .txt file with corrected headers?
Latest attempt - think it might work okay. Lots of effort/syntax just to change the case of column names!
read_cols <- function(x) {
titles <- fread(x , nrows = 0, header = TRUE, stringsAsFactors = FALSE )
var.names<-tolower(colnames(titles))
rest <- fread(x , skip =1 )
names(rest) <- var.names
return(rest)
}
tabtest2 <- list.files( pattern=".*PUDF.*base.*tab.*", full.names = TRUE)
%>% lapply( read_cols )
%>% dplyr::bind_rows()
Thank you.