How to resolve error 'Class attribute on column 1 of different items do not match' in R

Question

I have various .txt files stored in multiple folders. The txt files have various columns, one of which is Temperature. Few files have temperature column name as T2 [°C] while others have it as T2 [?C]. I want to keep the temperature column name as T2 [°C] in all the files. I do not want to change the names of other columns. Also, the number of columns in all the files is not the same. (e.g. Few files have columns such as Pressure, Temperature, Radiation, Wind velocity, Wind direction and other files have only Pressure, Temperature and Radiation. It can be thought of as a case of missing data. Missing columns can be added with NA values. To fix the problem of Temperature column name and number of columns, I am using the following code in R but in the end, it gives me an error: Error in rbindlist(dt.tidied, fill = TRUE) : Class attribute column 1 of item 142 does not match with column 1 of item 23.' Could anyone please help me how to modify the code to resolve the error.

install.packages("data.table")
library(data.table)

#List of files
filelist <- list.files("C:/Users/Akanksha/Desktop/BSRN/Test_Gz", full.names = TRUE, recursive 
= TRUE, pattern = ".txt$")

#Read the files
dt <- lapply(filelist, fread, skip = 27)

#Adjust Column names
dt.tidied <- lapply(dt, FUN = function(x){
  #adjust ? to degree
  setnames(x, old = "T2 [?C]", new = "T2 [°C]", skip_absent = TRUE)

  colnames(x) <- gsub("\\[", "(", colnames(x))
  colnames(x) <- gsub("\\]", ")", colnames(x))

  #return
  return(x)
})

#bind, filling missing columns to NA
merged <- rbindlist(dt.tidied, fill = TRUE, use.names = TRUE)

I tried to check the class attribute and found the following response. Both returns same answers, then I do not understand what is causing the error. Can anyone please help me.

> class(dt.tidied[[23]][1])
[1] "data.table" "data.frame"
> class(dt.tidied[[142]][1])
[1] "data.table" "data.frame"


> d1=dput(dt.tidied[[23]])
structure(list(V1 = c(NA, NA, NA), V2 = c("SRad(SRAD)", 
"Temp [?C] (TT)", "Temp QCode (TTC)"
)), row.names = c(NA, -3L), class = c("data.table", "data.frame"
), .internal.selfref = <pointer: (0x00000152b22fe7b0)>)

> d1=dput(dt.tidied[[142]])
956.902, 961.01, 965.114)), row.names = c(NA, -44615L), class = 
c("data.table", "data.frame"), .internal.selfref = <pointer: 
0x000001afc82f7590>)    #The result of dput(dt.tidied[[142]] was too 
large, I am unable to see the initial lines, hence, I am pasting the 
last few lines of the result.

Also, the code is giving me following error after dt <- lapply(...)

Error in FUN(X[[i]], ...) : skip=27 but the input only has 25 lines
In addition: There were 50 or more warnings (use warnings() to see the 
first 50)

Edit update: I checked my data and found out that I need to skip different number of rows in different txt files. Could it be the reason which is causing the error? And how to fix it? One way I can think of is to read the files from the line next to */ because next line to */ is the header and then data starts. It is common with all the files. Kindly help.

what data does column 1 represent? R is telling you that when it compared column one classes for all the data tables in your list, list elements 23 and 142 have different classes for column 1 (i.e., element 23 could have column 1 as a date while element 142 could have column 1 as a character). — On_an_island, May 17 '23 at 15:37
In both the files, first column is Date/Time (column heading). In both the files, data is written as 2004-08-01T00:00:00 (different starting dates in both files though, but every files has different starting date). Could you please help me in modifying the code to resolve the error. — Alexia k Boston, May 17 '23 at 20:28
Try the following and edit your question with the result. `class(dt.tidied[[23]][1])` and `class(dt.tidied[[142]][1])` — On_an_island, May 17 '23 at 20:52
@On_an_island Added the results, but it is the same for both items. I do not understand what is causing the error then. Could you please help. — Alexia k Boston, May 17 '23 at 23:32
@On_an_island I have updated my question with some details about the input. Could you please help. — Alexia k Boston, May 18 '23 at 12:50
Without having access to the data you're working with it will be hard for me to tell you exactly how to fix this. It appears that as you pointed out the issue may be when you're importing files and the data contains header rows. Is the data you're working with publicly available? If it is, I can have a better idea of what is actually happening and try to come up with a solution. — On_an_island, May 18 '23 at 14:10
can you do: `d1=dput(dt.tidied[[23]])` and `d2=dput(dt.tidied[[142]])` and copy and paste the results in your answer. — On_an_island, May 18 '23 at 15:15

On_an_island · Accepted Answer · 2023-05-20T21:16:48.070

The dput you provided for dt.tidied[[142]] was incomplete so I will explain what could be happening with an example. Let's assume your d1 and d2 data.tables look like this:

library(lubridate)

d1 = structure(
  list(V1 = c(NA, NA, NA),
       V2 = c("SRad(SRAD)","Temp [?C] (TT)", "Temp QCode (TTC)")),
  row.names = c(NA, -3L),
  class = c("data.table", "data.frame")
)

d2 = structure(
  list(V1 = c(NA, NA, NA),
       V2 = c(956.902, 961.01, 965.114)),
  row.names = c(NA, -44615L),
  class = c("data.table", "data.frame")
)

We'll make a list to hold these two data.tables and store them in element 1 and 2 of the list:

dt.tidied.working <- vector("list", length = 2)

dt.tidied.working[[1]] <- d1
dt.tidied.working[[2]] <- d2

Now we can check the class attribute of the column 1 for each list element:

> class(dt.tidied.working[[1]]$V1)
[1] "logical"

> class(dt.tidied.working[[2]]$V1)
[1] "logical"

So we can see that in this very simple example that both columns have the same class, logical, so when we use rbindlist we won't get the error you're getting.

merged <- rbindlist(dt.tidied.working, fill = TRUE, use.names = TRUE)

 V1               V2
1: NA       SRad(SRAD)
2: NA   Temp [?C] (TT)
3: NA Temp QCode (TTC)
4: NA          956.902
5: NA           961.01
6: NA          965.114

Now let's change the class attribute for the second data.table to what I think your full data.table might contain based on your comments so far:

## not working example
d1 = structure(
  list(V1 = c(NA, NA, NA),
       V2 = c("SRad(SRAD)","Temp [?C] (TT)", "Temp QCode (TTC)")),
  row.names = c(NA, -3L),
  class = c("data.table", "data.frame")
)

d2 = structure(
  list(V1 = ymd_hms(c("2004-08-01T00:00:00","2004-08-02T00:00:00","2004-08-03T00:00:00")),
       V2 = c(956.902, 961.01, 965.114)),
  row.names = c(NA, -44615L),
  class = c("data.table", "data.frame")
)

dt.tidied.not.working <- vector("list", length = 2)

dt.tidied.not.working[[1]] <- d1
dt.tidied.not.working[[2]] <- d2

Now we'll check the class attributes again as in the previous example and you'll see that the two columns now have different attributes and when we try to rbind.list you'll get the same error.

> class(dt.tidied.not.working[[1]]$V1)
[1] "logical"
> class(dt.tidied.not.working[[2]]$V1)
[1] "POSIXct" "POSIXt"

> merged <- rbindlist(dt.tidied.not.working, fill = TRUE, use.names = TRUE)
Error in rbindlist(dt.tidied.not.working, fill = TRUE, use.names = TRUE) : 
  Class attribute on column 1 of item 2 does not match with column 1 of item 1.

Potential Solution

Based on your dput of dt.tidied[[23]] it appears that you have NA values where you should have ymd_hms values. This could be the problem you're running into. If that's the case then what you can do is inspect elements of the dt list that contains the files you read in to see which files have NA in column V1. If that is in fact the issue, you will need to figure out what to do with those observations (remove them or fix them by assigning the date/time values).

Also, the d1=dput(dt.tidied[[142]]) output that you provided in your question is incomplete as you did not include structure(... the same way you did for d1=dput(dt.tidied[[23]]). Please provide complete and accurate information as it makes it easier for us to help you.

If you don't want to inspect every single element of your dt list, you can use the rbind.fill function from the plyr library:

library(plyr)

dt.tidied.not.working <- lapply(dt.tidied.not.working, function(x) as.data.frame(x))
merged <- plyr::rbind.fill(dt.tidied.not.working)

> merged
         V1               V2
1        NA       SRad(SRAD)
2        NA   Temp [?C] (TT)
3        NA Temp QCode (TTC)
4 1.091e+09          956.902
5 1.091e+09           961.01
6 1.091e+09          965.114

How to resolve error 'Class attribute on column 1 of different items do not match' in R

1 Answers1