I have a folder with over 1000 text files showing pollutant levels from particular air quality stations.
I'm looking to combine all these text files into one csv in R, so i can analyse the data temporally in one space.
Each text file is organised as below, with the unit name, the start time of the particular set of observations and then the columns for the data.
Example of the headers of my text file 1:
Unit 12345678
Start time: Wed Jan 29 10:57:58 2020
**dd/mm/yyyy hh:mm:ss, PM1, PM2.5, PM10, TSP, RHpre, Tpre, DPpre, RHpost, Tpost, DPpost**
29/01/2020 10:59:00, 1.39, 4.70, 17.11, 172.64, 36.10, 23.11, 7.17, 12.49, 41.26, 7.09
29/01/2020 11:00:00, 1.21, 3.64, 15.68, 26.39, 36.59, 23.12, 7.32, 12.41, 41.52, 7.17
29/01/2020 11:01:00, 1.20, 3.65, 15.12, 93.69, 36.51, 23.18, 7.43, 12.39, 41.68, 7.31
29/01/2020 11:02:00, 1.29, 4.09, 11.93, 15.31, 36.19, 23.22, 7.42, 12.30, 41.79, 7.37
29/01/2020 11:03:00, 1.30, 3.74, 9.06, 11.90, 36.04, 23.26, 7.33, 12.27, 41.88, 7.27
29/01/2020 11:04:00, 1.33, 4.31, 18.62, 44.38, 35.98, 23.28, 7.33, 12.21, 41.97, 7.34
Example of text file 2
Unit 12345678
Start time: Wed Jan 29 11:14:46 2020
**dd/mm/yyyy hh:mm:ss, PM1, PM2.5, PM10, TSP, RHpre, Tpre, DPpre, RHpost, Tpost, DPpost**
29/01/2020 11:16:00, 1.29, 4.80, 12.68, 14.96, 36.77, 23.15, 7.69, 14.41, 38.14, 6.58
29/01/2020 11:17:00, 1.24, 3.97, 13.30, 18.04, 37.51, 23.13, 7.58, 14.23, 38.57, 6.76
29/01/2020 11:18:00, 1.13, 3.50, 16.80, 60.72, 37.09, 23.16, 7.80, 14.11, 38.89, 6.84
29/01/2020 11:19:00, 1.33, 4.56, 14.23, 71.32, 38.96, 23.22, 8.25, 14.24, 39.15, 7.04
29/01/2020 11:20:00, 1.23, 3.72, 16.87, 22.36, 38.13, 23.29, 8.47, 14.00, 39.39, 7.27
29/01/2020 11:21:00, 1.17, 4.47, 12.30, 15.60, 37.00, 23.34, 8.36, 13.86, 39.62, 7.24
29/01/2020 11:22:00, 1.28, 4.18, 12.80, 229.03, 36.27, 23.36, 7.54, 13.70, 39.85, 7.37
29/01/2020 11:23:00, 1.34, 4.28, 17.27, 96.94, 36.19, 23.37, 7.50, 13.54, 40.05, 7.30
So for each text file, the first (station ID) and the third (column names) will remain the same for the particular station but the second line will change with each output produced by the monitor.
As mentioned above I am looking to combine all these text files together, but under the uniform header of the column names (dd/mm/yyyy hh:mm:ss, PM1, PM2.5, PM10, TSP, RHpre, Tpre, DPpre, RHpost, Tpost, DPpost), because this will be consistent in every monitor i have access too and therefore the code can be easily reproduced.
I've attempted:
mypath = "C:/Desktop/mytxtfolder/"
txt_files_ls = list.files(path=mypath, pattern="*.txt")
txt_files_df <- lapply(txt_files_ls, function(x) {read.table(file = x,skip =3, header = T, sep =",")})
combined_df <- do.call("rbind", lapply(txt_files_df, as.data.frame))
And get consistent errors of
Error in rbind(deparse.level, ...) :
numbers of columns of arguments do not match
I assume this is because the second row values (time of uploads) do not match and i am using the function incorrectly to skip the first two rows and only combine on the third row.