R - refer to column names rather than column index when using lapply with data frame

Question

I am using lapply to take values from specific columns of a data frame and change them from a 1-5 scale to the inverse (i.e., 1 becomes 5, 2 becomes 4). I have managed to do this by referring to the column index:

df_vars[,c(104:183, 222:249, 271:290)] <- lapply(df_vars[,c(104:183, 222:249, 271:290)],
                                                 FUN = function(x) misty::item.reverse(x, min = 1, max = 5))

I want to be able to do the same thing but using column names instead. I cannot do this by just referring to all numeric columns or columns with ranges from 1 to 5, as not all of the columns with 1-5 scales need inverting. I also may need to drop columns and then rerun this code, so I would like to refer to column names instead.

I have tried using grep to get column indexes using the following code:

Using some example data:

# create example data frame
df <- data.frame("A" = c(1, 3, 5),
                 "B" = c(1, 2, 3),
                 "C" = c(4, 2, 1),
                 "D" = c(3, 2, 5),
                 "E" = c(5, 5, 4),
                 "F" = c(1, 2, 1),
                 "G" = c(3, 4, 3),
                 "H" = c(4, 3, 2))

# for this example, only A B D F G H need to be inverted

This is a small data frame, but my data frame is much larger with over 100 columns to invert, so pretend the example data set is too big to realistically work with one column at a time.

Using the example data and specified columns to invert, the desired output would be the following data frame:

# transformed data frame
df <- data.frame("A" = c(5, 3, 1),
                 "B" = c(5, 4, 3),
                 "C" = c(4, 2, 1),
                 "D" = c(3, 4, 1),
                 "E" = c(5, 5, 4),
                 "F" = c(5, 4, 5),
                 "G" = c(3, 2, 3),
                 "H" = c(2, 3, 4))

I tried using grep to get the column index using the column names. Based on the example data, the code I tried was:

df[, colnames(select(df, "A":"B", "D", "F":"H"))] <- lapply(grep(colnames(select(df, c("A":"B", "D", "F":"H"))), df),
                                                            FUN = function(x) misty::item.reverse(x, min = 1, max = 5))

This did not work. Testing the grep function on its own gave this:

> grep(colnames(select(df, c("A":"B", "D", "F":"H"))), df)
integer(0)
Warning message:
In grep(colnames(select(df, c("A":"B", "D", "F":"H"))), df) :
  argument 'pattern' has length > 1 and only the first element will be used
>

Any ideas? Thank you.

PaulS · Answer 1 · 2022-07-19T12:25:18.790

2

A possible solution, based on dplyr:

library(dplyr)

df %>% 
  mutate(across(A:H, ~ (5:1)[.x]))

#>   A B C D E F G H
#> 1 5 5 2 3 1 5 3 2
#> 2 3 4 4 4 1 4 2 3
#> 3 1 3 5 1 2 5 3 4

edited Jul 19 '22 at 12:25

answered Jul 19 '22 at 11:34

PaulS

21,159
2
9
26

1

Thanks for having commented my solution, @Adam. I have just updated my solution, replacing `everything` by `tidyselected`. – PaulS Jul 19 '22 at 12:26
Thank you for the suggestion, however this solution transforms all of the columns, whereas for the sake of making an example similar to my problem, I said I only wanted the inversion applied to A B D F G H. I guess this is solved using ```c()```? Also, is it not straightforward to specify a list of column names, obtain the indexes for those columns and then use those indexes in lapply? – Dee G Jul 19 '22 at 12:36
1

If the target columns are contiguous, then my new solution works as you want. Moreover, you can replace `A:H` by `c(104:183, 222:249, 271:290)` in my updated solution, and it will work fine. – PaulS Jul 19 '22 at 12:38
I've managed to adapt your solution using ```c()```, so thank you. As for the lapply problem, is this just not possible? My question was more about that than about how to do the inversion as I had already managed to do it using column indexes and I may want to use lapply for other functions but refer to column names rather than column indexes. Is there anything similar to the ```grep``` approach that works with a list of column names? – Dee G Jul 19 '22 at 12:44
@Adam's solution uses `sapply` and therefore is closer than mine to the solution with `lapply` that you want to achieve. – PaulS Jul 19 '22 at 12:51
I would do it this way, for what it's worth. – Jul 19 '22 at 13:03

score 1 · Accepted Answer · 2022-07-19T12:55:46.350

1

You can use sapply() as follows. The problem in this example is that you cannot set ranges of columns by name easily.

cols <- c("A", "B", "D", "F", "G", "H")

df[,cols] <- sapply(df[,cols], \(x) (5:1)[x])

The easiest way to select by a range of columns is to use eval_select() to return their positions by number. But if you do this, you might as well just use the dplyr solution. This is essentially an under the hood look at it.

library(tidyselect)

col_pos <- eval_select(expr(c(A:B, D, F:H)), df)

df[,col_pos] <- sapply(df[,col_pos], \(x) (5:1)[x])

edited Jul 19 '22 at 12:55

answered Jul 19 '22 at 12:45

Thank you. The real data frame is a lot larger so the ```eval_select``` approach is very useful. Is there a reason to use ```sapply``` here, rather than ```lapply```? I adapted your ```tidyselect``` solution using my ```FUN =``` code and it seems to work regardless of which I use. – Dee G Jul 19 '22 at 13:06
1

`sapply()` will simplify it back into the data.frame. `lapply()` will keep it as a list. But a data.frame is really a special type of list, so in this case it's fine and as you observed, you can probably use either – Jul 19 '22 at 13:07

R - refer to column names rather than column index when using lapply with data frame

2 Answers2