How to include conditional statements while reading the data in R?

Question

Using the code mentioned below in R, I am reading the data (.txt files) saved in different folders. There are following conditions that I need to include in my code.

The structure of my txt files is as follows:

Date/Time XY [XY] C1 [m2] C1c C2 [m] C2c C3 [W] C3c K PP [Pa]..
2005-03-01S01:00:00 0.98 250 0 29 0 289 0 98 289...
2005-03-01S02:00:00 0.97 240 0 28 2 279 0 98 89...
2005-03-01S03:00:00 0.98 252 -1 29 0 289 0 16 289...
..
..

I want following conditions to be included in the code.

   if C1c is not = 0, then C1 = NA,
   if -400 > C1 > 350, then C1 = NA,
   if C2c is not = 0, then C2 = NA,
   if -250 > C2 > 450, then C2 = NA, 
   if C3c is not = 0, then C3 = NA,
   if 100 > C3 > 500, then C3 = NA,
   if K < 90, then K = NA
   if PP < 200, then PP = NA

It is to be noted that not all text files have all these columns. So, the logic should be if the file has the concerned column, the respective condition should be applied to it.

Existing code:

library(data.table)

filelist <- list.files("D:/Test2/", full.names = TRUE, recursive 
                   = TRUE, pattern = ".txt$")
dt <- lapply(filelist, function(file) {
  lines <- readLines(file)
  comment_end = match("*/", lines)
  fread(file, skip = comment_end)
})

dt.tidied <- lapply(dt, FUN = function(x){
  setnames(x, old = "T2 [?C]", new = "T2 [°C]", skip_absent = TRUE)
  colnames(x) <- gsub("\\[", "(", colnames(x))
  colnames(x) <- gsub("\\]", ")", colnames(x))

  return(x)
})

merged <- rbindlist(dt.tidied, fill = TRUE, use.names = TRUE)

write.csv(merged, "D:/Test2/Merged2.csv")

Could anyone please help me in modifying the code to include the conditions.

Why do you need the conditions to be applied while reading the data? Is it ok to read the data in and then do the processing? — Conor Neilson, May 20 '23 at 00:27

score 1 · Accepted Answer · answered May 20 '23 at 00:29

Include logic to test if the column exists prior to any operation reliant on that column e.g:

dt.tidied <- lapply(dt, FUN = function(x){
  setnames(x, old = "T2 [?C]", new = "T2 [°C]", skip_absent = TRUE)
  colnames(x) <- gsub("\\[", "(", colnames(x))
  colnames(x) <- gsub("\\]", ")", colnames(x))
  
  # Apply conditions to the respective columns
  if ("C1c" %in% colnames(x)) {
    x[C1c != 0, C1 := NA]
    x[C1 < -400 | C1 > 350, C1 := NA]
  }
  
  if ("C2c" %in% colnames(x)) {
    x[C2c != 0, C2 := NA]
    x[C2 < -250 | C2 > 450, C2 := NA]
  }
  
  if ("C3c" %in% colnames(x)) {
    x[C3c != 0, C3 := NA]
    x[C3 < 100 | C3 > 500, C3 := NA]
  }
  
  if ("K" %in% colnames(x)) {
    x[K < 90, K := NA]
  }
  
  if ("PP" %in% colnames(x)) {
    x[PP < 200, PP := NA]
  }
  
  return(x)
})

I think this approach will be quite unwieldy when the number of conditions increase or adjustments of conditions are required. Also, it is difficult to generalize to cases where each dt has different set of conditions. — Hieu Nguyen, May 20 '23 at 19:16
"*So, the logic should be if the file has the concerned column*" it is literally requested. If you have a simpler approach please provide a answer. — Paul Maxwell, May 20 '23 at 23:31

Hieu Nguyen · Answer 2 · 2023-05-20T19:19:05.497

First, these conditions are illogical, please recheck them:

if -400 > C1 > 350, then C1 = NA, # I changed this to 400 > C1 > 350 for the below example
if -250 > C2 > 450, then C2 = NA, 
if 100 > C3 > 500, then C3 = NA,

library(data.table)

dt <- data.table(
    XY = c(0.98, 0.97, 0.98, 0.97), 
    C1 = c(250, 240, 252, 375), 
    C1c = c(0, 0, -1, 0)
) # sample data

le <- list(
    quote(if(C1c != 0) C1 <- NA),
    quote(if((400 > C1) & (C1 > 350)) C1 <- NA),
    quote(if(C2c != 0) C2 <- NA),
    quote(if((-250 > C2) & (C2 > 450)) C2 <- NA),
    quote(if(C3c != 0) C3 <- NA),
    quote(if((100 > C3) & (C3 > 500)) C3 <- NA),
    quote(if(K < 90) K <- NA),
    quote(if(PP < 200) PP <- NA)
) # translate your condition to standard R code and put them here
names(le) <- lapply(le, deparse) # to get pretty names
lapply(le, \(ee){
    cond <- ee[[2]] # take the condition part
    action <- ee[[3]] # take the action part
    action[[1]] <- quote(`:=`) # replace `<-` function for `:=` of data.table
    n_dt <- names(dt)
    if (all(all.vars(cond) %chin% n_dt) & all(all.vars(action) %chin% n_dt)) {
        eval(bquote(dt[.(cond), .(action)])) # build data.table syntax using condition and action parts, then evaluate
        copy(dt) # optional: keep record of each transformation of data.table
    } else sprintf("variables are not available in data.table")
})

####################RESULT################
$`if (C1c != 0) C1 <- NA`
     XY  C1 C1c
1: 0.98 250   0
2: 0.97 240   0
3: 0.98  NA  -1
4: 0.97 375   0

$`if ((400 > C1) & (C1 > 350)) C1 <- NA`
     XY  C1 C1c
1: 0.98 250   0
2: 0.97 240   0
3: 0.98  NA  -1
4: 0.97  NA   0

$`if (C2c != 0) C2 <- NA`
[1] "variables are not available in data.table"

$`if ((-250 > C2) & (C2 > 450)) C2 <- NA`
[1] "variables are not available in data.table"

$`if (C3c != 0) C3 <- NA`
[1] "variables are not available in data.table"

$`if ((100 > C3) & (C3 > 500)) C3 <- NA`
[1] "variables are not available in data.table"

$`if (K < 90) K <- NA`
[1] "variables are not available in data.table"

$`if (PP < 200) PP <- NA`
[1] "variables are not available in data.table"

How to include conditional statements while reading the data in R?

2 Answers2