0

I have various .txt files stored in multiple folders. The txt files have various columns, one of which is Temperature. Few initial txt files temperature column name as T2 [°C] while others have it as T2 [?C]. I want to keep the temperature column name as T2 [°C] in all the files. I do not want to change the names of other columns. Also, the number of columns in all the files is not the same. (e.g. Few files have columns such as Pressure, Temperature, Radiation, Wind velocity, Wind direction and other files have only Pressure, Temperature and Radiation. It can be thought of as a case of missing data. I could think of a logical condition that whereever we have T2 [?C], it should be replaced with T2 [°C])

I tried to use colnames(my_dataframe)[colnames(my_dataframe) == "id"] ="c1" but it didn't work.

I am using following code in R

setwd("D:/Data/RawData/Task")
dir <- "D:/Data/RawData/Task/"
fnames <- list.files(dir, full.names = T, recursive = TRUE)
colnames(fnames)[colnames(fnames) == "T2 [?C]"] ="T2 [°C]"
xy <- do.call(rbind, lapply(fnames, read.table, header=TRUE, sep = "\t", check.names = FALSE, 
skip = 27))

Could anyone please help me in changing the column name as well as in fixing the number of columns in all the files.

  • what do you mean with " the number of columns in all the files is not the same"? It seems like some text files contain more columns than others, but why is that an issue? – maarvd May 17 '23 at 11:36
  • @maarvd Yes, few files have more columns than the other files. e.g. Few files have columns such as Pressure, Temperature, Radiation, Wind velocity, Wind direction and other files have only Pressure, Temperature and Radiation. It can be thought of as a case of missing data. So, I want to make a condition that whereever we have T2 [?C], it should be replaced with T2 [°C]. Hope that clarifies your question. – Alexia k Boston May 17 '23 at 11:53

1 Answers1

1

We can tidy the column names in an lapply function and merge using rbindlist, fill = TRUE which fills missing columns with NA. I chose to replace [] with () in the column names (use of [ ] may lead to issues).

#libraries
library(data.table)

#list of files
filelist <- list.files("D:/Data/RawData/Task/", 
                       full.names = TRUE,
                       recursive = TRUE
                       pattern = ".txt$")

#read
dt <- lapply(filelist, fread)

#adjust colnames
dt.tidied <- lapply(dt, FUN = function(x){
  #adjust ? to °
  setnames(x, old = "T2 [?C]", new = "T2 [°C]", skip_absent = TRUE)
  
  #replace [] with ()
  colnames(x) <- gsub("\\[", "(", colnames(x))
  colnames(x) <- gsub("\\]", ")", colnames(x))
  
  #return
  return(x)
})


#bind, filling missing columns to NA
merged <- rbindlist(dt.tidied, fill = TRUE)
maarvd
  • 1,254
  • 1
  • 4
  • 14
  • Thank you for your response. As the starting lines of all my data files contain metadata which I want to skip (mentioned in my code), I have modified your code dt <- lapply(filelist, fread, skip = 27). On merged <- rbindlist(dt.tidied, fill = TRUE), it returns an error 'Class attribute column 1 of item 142 does not match with column 1 of item 23. ' What does that mean? How can I resolve it? – Alexia k Boston May 17 '23 at 14:47
  • see ```https://stackoverflow.com/questions/55706560/make-rbindlist-skip-ignore-or-change-class-attribute-of-the-column``` for some options. Your issue is that some columns are present in several files but that the data is present as different classes. – maarvd May 17 '23 at 14:56
  • Lately I found out that the number of initial lines to skip in each file is different, so skip = 27 cannot work. I tried to adapt the code from https://stackoverflow.com/questions/46223063/r-import-files-with-differing-number-of-initial-rows-to-skip but I am unable to rewrite it as per my requirement. Could you please help me with it. – Alexia k Boston May 18 '23 at 17:09
  • For structure, could you ask that as a seperate question and provide some sample data? – maarvd May 22 '23 at 14:42