I am trying to read many text files into R using read.table. Most of the time we have clean text files which have defined columns.
The data that I am trying to read comes from ftp://ftp.cmegroup.com/delivery_reports/live_cattle_delivery/102317_livecattle.txt
You can see that the blanks and length of text files varies by report. ftp://ftp.cmegroup.com/delivery_reports/live_cattle_delivery/102317_livecattle.txt ftp://ftp.cmegroup.com/delivery_reports/live_cattle_delivery/100917_livecattle.txt
My objective is to read many of these text files and combine them into a dataset.
If I can read one of the them then compiling should not be an issue. However, I am running into several issues because of the format of the text file:
1) the number of FIRMS vary from report to report. For example, sometimes there will be 3 rows (i.e. 3 firms that did business on that data) of data to import and sometimes there may be 10.
2) Blanks are being recognized. For example, under the FIRM section there should be a column for Deliveries (DEL) and Receipts (REC). The data when it is read in THIS section should look like:
df <- data.frame("FIRM_#" = c(407, 685, 800, 905),
"FIRM_NAME" = c("STRAITS FIN LLC", "R.J.O'BRIEN ASSOC", "ROSENTHAL COLLINS LL", "ADM INVESTOR SERVICE"),
"DEL" = c(1,1,15,1), "REC"= c(NA,18,NA,NA))
however when I read this in the fomatting is all messed up and does not put NA for the blank values
3) The above issues apply for "YARDS" and "FUTURE DELIVERIES SCHEDULED" section of the text file.
I have tried to read in sections of the text file and then format it accordingly but since the the number of firms change day to day the code does not generalize.
Any help would greatly be appreciated.