I have various .txt files stored in multiple folders. The txt files have various columns, one of which is Temperature. Few files have temperature column name as T2 [°C] while others have it as T2 [?C]. I want to keep the temperature column name as T2 [°C] in all the files. I do not want to change the names of other columns. Also, the number of columns in all the files is not the same. (e.g. Few files have columns such as Pressure, Temperature, Radiation, Wind velocity, Wind direction and other files have only Pressure, Temperature and Radiation. It can be thought of as a case of missing data. Missing columns can be added with NA values. To fix the problem of Temperature column name and number of columns, I am using the following code in R but in the end, it gives me an error: Error in rbindlist(dt.tidied, fill = TRUE) : Class attribute column 1 of item 142 does not match with column 1 of item 23.' Could anyone please help me how to modify the code to resolve the error.
install.packages("data.table")
library(data.table)
#List of files
filelist <- list.files("C:/Users/Akanksha/Desktop/BSRN/Test_Gz", full.names = TRUE, recursive
= TRUE, pattern = ".txt$")
#Read the files
dt <- lapply(filelist, fread, skip = 27)
#Adjust Column names
dt.tidied <- lapply(dt, FUN = function(x){
#adjust ? to degree
setnames(x, old = "T2 [?C]", new = "T2 [°C]", skip_absent = TRUE)
colnames(x) <- gsub("\\[", "(", colnames(x))
colnames(x) <- gsub("\\]", ")", colnames(x))
#return
return(x)
})
#bind, filling missing columns to NA
merged <- rbindlist(dt.tidied, fill = TRUE, use.names = TRUE)
I tried to check the class attribute and found the following response. Both returns same answers, then I do not understand what is causing the error. Can anyone please help me.
> class(dt.tidied[[23]][1])
[1] "data.table" "data.frame"
> class(dt.tidied[[142]][1])
[1] "data.table" "data.frame"
> d1=dput(dt.tidied[[23]])
structure(list(V1 = c(NA, NA, NA), V2 = c("SRad(SRAD)",
"Temp [?C] (TT)", "Temp QCode (TTC)"
)), row.names = c(NA, -3L), class = c("data.table", "data.frame"
), .internal.selfref = <pointer: (0x00000152b22fe7b0)>)
> d1=dput(dt.tidied[[142]])
956.902, 961.01, 965.114)), row.names = c(NA, -44615L), class =
c("data.table", "data.frame"), .internal.selfref = <pointer:
0x000001afc82f7590>) #The result of dput(dt.tidied[[142]] was too
large, I am unable to see the initial lines, hence, I am pasting the
last few lines of the result.
Also, the code is giving me following error after dt <- lapply(...)
Error in FUN(X[[i]], ...) : skip=27 but the input only has 25 lines
In addition: There were 50 or more warnings (use warnings() to see the
first 50)
Edit update: I checked my data and found out that I need to skip different number of rows in different txt files. Could it be the reason which is causing the error? And how to fix it? One way I can think of is to read the files from the line next to */ because next line to */ is the header and then data starts. It is common with all the files. Kindly help.