1

I have a following dataset with 100 variables Z1 to Z100 containing character values and 10 variables containing column numbers.

Z1 ... Z100 V1 ... V10
NA ... xyz 56 ... NA

An exemplary dataset - with 5 rows, 5 Z-variables (Z1-Z5) and 3 V-variables (V1-3) and 3 created variables V1_1-V1_3:

structure(list(Z1 = c("aaa", NA, "aaa", NA, "aaa", NA, "aaa"), 
    Z2 = c("bbb", "bbb", "bbb", "bbb", "bbb", NA, "bbb"), Z3 = c("ccc", 
    "ccc", NA, "ccc", "ccc", "ccc", "ccc"), Z4 = c("ddd", "ddd", 
    NA, "ddd", NA, "ddd", "ddd"), Z5 = c("eee", "eee", "eee", 
    "eee", "eee", "eee", NA), V1 = c(1, 3, 4, NA, 2, 2, NA), 
    V2 = c(NA, 3, 3, NA, 5, NA, 1), V3 = c(4, 4, 4, NA, 5, 5, 
    NA), V1_1 = c("aaa", "ccc", NA, NA, "bbb", NA, NA), V2_1 = c(NA, 
    "ccc", NA, NA, "eee", NA, "aaa"), V3_1 = c("ddd", "ddd", 
    NA, NA, "eee", "eee", NA)), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -7L))

I'd like to create a set of 3 new variables V1_1 to V3_1, where each will contain a character value from a column which number was in original V variable.

Technically one could do the following:

dataset %>% mutate(., V1_1 = c_across(1:5)[V1],
...
V3_1 = c_across(1:5)[V3]
)

Just not to do it step by step, I've tried the following with across:

dataset %>% 
mutate(across(V1:V3, dataset[,.], .names = "{col}_1"))

But I got an error:

Error in `mutate()`:
! Problem while computing `..1 = across(V1:V3, m3[, .], .names = "{col}_1")`.
Caused by error in `.subset()`:
! invalid subscript type 'list'
  • Could you provide a little bit of sample data in valid R syntax so it's easy to test solutions? Five Z columns and 2 or 3 V columns would be plenty to illustrate the problem. – Gregor Thomas Oct 05 '22 at 12:39
  • Also, **if** your real data has more than 1 row, please use 2 rows of example data. There might be shortcuts if your input is 1 row that wouldn't generalize well to multirow data. – Gregor Thomas Oct 05 '22 at 12:53
  • @GregorThomas Here is a cloud-based .csv: https://drive.google.com/file/d/1wclUb0SRef51hPB58wZ87uO1HJi-fiGH/view?usp=sharing – Jakub Morze Oct 05 '22 at 13:20
  • Sorry, I don't download files for Stack Overflow questions - takes too long, and often ends up with follow-up questions about column classes and such, and the question isn't a good resource for future users when your link goes stale. Could you please just put a tiny example in the question? If `sample_data` is the name of that data frame you posted in Drive, then `dput(sample_data)` will give R code to replicate it including all class and structure info--just put the `dput()` output directly in your question. – Gregor Thomas Oct 05 '22 at 13:26
  • 1
    Now, it's revised! – Jakub Morze Oct 05 '22 at 13:57
  • Thanks, that's perfect. And welcome to the site! – Gregor Thomas Oct 05 '22 at 14:12

1 Answers1

0

The "tidy" way to approach this problem would be to convert your data to long format with a single Z column and maybe a separate data set with a V column. But we can skip that by using matrix indexing.

Here, we break apart your data into a z matrix and a v data frame, and then extract the correct z elements easily using a matrix of row and column indices in [ to pull them out. I used a _result suffix so that we can compare with your _1 suffix and see that the solution worked.

z = dataset %>% select(starts_with("Z")) %>% as.matrix
v = dataset %>% select(matches("V[0-9]+$"))
result = lapply(v, \(x) z[cbind(seq_along(x), x)])
names(result) = paste0(names(result), "_result")

dataset %>% cbind(result)
#     Z1   Z2   Z3   Z4   Z5 V1 V2 V3 V1_1 V2_1 V3_1 V1_result V2_result V3_result
# 1  aaa  bbb  ccc  ddd  eee  1 NA  4  aaa <NA>  ddd       aaa      <NA>       ddd
# 2 <NA>  bbb  ccc  ddd  eee  3  3  4  ccc  ccc  ddd       ccc       ccc       ddd
# 3  aaa  bbb <NA> <NA>  eee  4  3  4 <NA> <NA> <NA>      <NA>      <NA>      <NA>
# 4 <NA>  bbb  ccc  ddd  eee NA NA NA <NA> <NA> <NA>      <NA>      <NA>      <NA>
# 5  aaa  bbb  ccc <NA>  eee  2  5  5  bbb  eee  eee       bbb       eee       eee
# 6 <NA> <NA>  ccc  ddd  eee  2 NA  5 <NA> <NA>  eee      <NA>      <NA>       eee
# 7  aaa  bbb  ccc  ddd <NA> NA  1 NA <NA>  aaa <NA>      <NA>       aaa      <NA>
Gregor Thomas
  • 136,190
  • 20
  • 167
  • 294