1

This is the kind of data I have:

Date Station Param1 Param2
2020-01-01 A <5 45
2020-02-01 B <5 47

To be able to plot this data, mark the LOQ-values (<5) and compute some basic statistics, I need to create new columns with the LOQ-flag (<) and numeric values separated.

I don't have exact knowledge of the Param-names (they are actually "Fe", "Cu", "N-tot" and so on), so I would like to loop over the Param-columns (not Date and Station) and create two new columns for each Param, one with the numerical data and one with the LOQ-flag. Like this:

Date Station Param1_org Param1_new Param1_loq Param2_org Param2_new Param2_loq
2020-01-01 A <5 5 < 45 45 =
2020-02-01 B <5 5 < 47 47 =

I have tried mutate (dplyr) but I am struggeling with how to use the conditions together with gsub inside mutate and across. I also considered using apply and a list of Params, but got lost in the code.

I need some advice on which approach to choose, and a simple example of how to achieve this. I appreciate all help given!

  • Combining `mutate` with functions of the `stringr` package (part of the Tidyverse) would be my choice here, `stringr` can do everything you want with relatively intuitive coding. See here for a [cheat sheet](https://evoldyn.gitlab.io/evomics-2018/ref-sheets/R_strings.pdf), which I almost always have open when I'm using R. – C. Murtaugh Jun 11 '23 at 21:28

1 Answers1

0

Here's the answer of your question

library(tidyverse)

data <- tibble(Date = c(as.Date("2020-01-01"), as.Date("2020-02-01")),
                  Station = c("A", "B"), 
                  Param1 = c("<5", "<5"),
                  Param2 = c("45", "47"))

cols <- colnames(data)
param_cols <- cols[str_detect(cols, "^Param")]


for (col in param_cols) {
  col_name <- paste(col, "org", sep = "_")
  col_new<- paste(col, "new", sep = "_")
  col_loq <- paste(col, "loq", sep = "_")
  data <-data %>% 
    mutate(!!col_name := get(col), 
           !!col_new := str_extract(get(col), "\\d+"),
           !!col_loq := ifelse(str_detect(get(col), "^\\d"), 
                               "=", 
                               ifelse(str_detect(get(col), "^<"), "<", ">")
                               ),
           !!col := NULL
           )
}

print(data)

enter image description here

What I did is simply looping through all the columns contain Param and using mutate (again with another regex detection). The !! is just escaping for a variable to be able for being used on dplyr argument (note: dplyr version 1.0 or higher)

Abdullah Faqih
  • 116
  • 1
  • 7
  • 1
    Thanks @Abdullah Faqih, it took some time for me to really understand this, but I finally managed to adjust it to my needs. It's somewhat weird that dplyr makes it so difficult for us to use the current column name. I have noted that there is a cur_column() for using within functions, but in my case I was using an ifelse and never managed to make it work. Your solution was easier! – Martin Liungman Jun 15 '23 at 21:48