Looping over data frame to "clean" data

Question

This is the kind of data I have:

Date	Station	Param1	Param2
2020-01-01	A	<5	45
2020-02-01	B	<5	47

To be able to plot this data, mark the LOQ-values (<5) and compute some basic statistics, I need to create new columns with the LOQ-flag (<) and numeric values separated.

I don't have exact knowledge of the Param-names (they are actually "Fe", "Cu", "N-tot" and so on), so I would like to loop over the Param-columns (not Date and Station) and create two new columns for each Param, one with the numerical data and one with the LOQ-flag. Like this:

Date	Station	Param1_org	Param1_new	Param1_loq	Param2_org	Param2_new	Param2_loq
2020-01-01	A	<5	5	<	45	45	=
2020-02-01	B	<5	5	<	47	47	=

I have tried mutate (dplyr) but I am struggeling with how to use the conditions together with gsub inside mutate and across. I also considered using apply and a list of Params, but got lost in the code.

I need some advice on which approach to choose, and a simple example of how to achieve this. I appreciate all help given!

Combining `mutate` with functions of the `stringr` package (part of the Tidyverse) would be my choice here, `stringr` can do everything you want with relatively intuitive coding. See here for a [cheat sheet](https://evoldyn.gitlab.io/evomics-2018/ref-sheets/R_strings.pdf), which I almost always have open when I'm using R. — C. Murtaugh, Jun 11 '23 at 21:28

score 0 · Accepted Answer · answered Jun 11 '23 at 22:02

Here's the answer of your question

library(tidyverse)

data <- tibble(Date = c(as.Date("2020-01-01"), as.Date("2020-02-01")),
                  Station = c("A", "B"), 
                  Param1 = c("<5", "<5"),
                  Param2 = c("45", "47"))

cols <- colnames(data)
param_cols <- cols[str_detect(cols, "^Param")]


for (col in param_cols) {
  col_name <- paste(col, "org", sep = "_")
  col_new<- paste(col, "new", sep = "_")
  col_loq <- paste(col, "loq", sep = "_")
  data <-data %>% 
    mutate(!!col_name := get(col), 
           !!col_new := str_extract(get(col), "\\d+"),
           !!col_loq := ifelse(str_detect(get(col), "^\\d"), 
                               "=", 
                               ifelse(str_detect(get(col), "^<"), "<", ">")
                               ),
           !!col := NULL
           )
}

print(data)

What I did is simply looping through all the columns contain Param and using mutate (again with another regex detection). The !! is just escaping for a variable to be able for being used on dplyr argument (note: dplyr version 1.0 or higher)

Thanks @Abdullah Faqih, it took some time for me to really understand this, but I finally managed to adjust it to my needs. It's somewhat weird that dplyr makes it so difficult for us to use the current column name. I have noted that there is a cur_column() for using within functions, but in my case I was using an ifelse and never managed to make it work. Your solution was easier! — Martin Liungman, Jun 15 '23 at 21:48

Looping over data frame to "clean" data

1 Answers1