1

I am trying to create a custom function that I can apply across various columns to recode values from characters to numeric data. The data has many blanks and each character entry is the same in each given column (ie. when there is a survey question that is "select all that apply" so you need to create binary 1/0 variables if the choice was selected). So logical i am trying to create a function that does the following:

"In a specified range of columns in the data, if there is any character present, recode that entry as 1, otherwise mark as NA"

This works as a standalone function as follows perfectly:

data$var <- if_else(data$var == data$var[grep("[a-z]", data$var)], 1, NULL)

But I am having trouble creating a function that does this that I can apply to many different columns.

I have tried to solve this with lapply, mutate, and if_else in the following ways to no avail.

I can return the indices correctly with the following fxn but need to update the actual dataframe:

fxn <- function(x) {
  if_else(x == (x[grep("[a-z]", x)]), 1, NULL)
}

fxn(data$variable)

But when I try to use mutate to update the dataframe as follows it doesn't work:

data %>% 
  mutate(across(.cols = variable, fxn))

Any help would be appreciated as there are 100+ columns I need to do this on!

Andre Wildberg
  • 12,344
  • 3
  • 12
  • 29
Allison
  • 33
  • 2
  • I guess you want something like `if_else(grepl("[a-z]", x), 1, NA)` – Ric Jan 05 '23 at 23:14
  • @akrun I switched to NA, but got the following error - "Error in `if_else()`: ! `false` must be a double vector, not a logical vector." so I swiched it again to NA_real_ and that got rid of the error and it is returning the correct indices, but I still can't get it to update the actual column in the dataframe. – Allison Jan 05 '23 at 23:28
  • I want to use something like: data[,columns] <- lapply(data[,columns], FUN = fxn) – Allison Jan 05 '23 at 23:31
  • @akrun That worked well thanks! Any idea how to now apply this function to a range of columns so that all the cells that get recoded as 1 are actually updated in the dataframe itself? not just returned in the console? – Allison Jan 05 '23 at 23:41

1 Answers1

0

We create the function and apply to the selected columns with lapply. In the below example, columns 1 to 5 are selected and applied the function, and assigned back

fxn <- function(x) NA^(!grepl('[a-z]', x))
data[1:5] <- lapply(data[1:5], fxn)
akrun
  • 874,273
  • 37
  • 540
  • 662
  • 1
    Yes! This pretty much worked with one small adjustment with the comma, as I was working with just the columns: data[,1:5] <- lapply(data[,1:5], fxn) -- Thanks so much! – Allison Jan 06 '23 at 00:01
  • @Allison it was just an example to show that you can either have a vector of column names or its index. here, it is columns 1 to 5, which you may need to change – akrun Jan 06 '23 at 00:03