How do I create a column based on values in another column which are the names of variables in my dataframe whose data I want to fill newcol with? R

Question

I apologize if the articulation of my question is confusing, I haven't been able to find similar threads which clarify the English of my question.

I am working with a sample of data which resembles that seen below:

label1	label2	label3	label#
value1	value4	value7	label2
value2	value5	value8	label1
value3	value6	value9	label3

I'm trying to create a new column, 'currentvalue', which reads in the value of label# in a certain row, then for that row populates the column with that row's value of whatever column is named in label#. In other words, I want my output to look like this:

label1	label2	label3	label#	currentvalue
value1	value4	value7	label2	value4
value2	value5	value8	label1	value2
value3	value6	value9	label3	value9

The only solutions I can think of for this involve multiple for loops, which I imagine is very computationally inefficient. I've been searching stack overflow for threads which could help me write a vectorized solution to this, but I don't think I've been able to articulate the problem very well because none of my searches were helpful. Any help is appreciated (including help stating my question better).

score 1 · Answer 1 · answered Dec 15 '21 at 13:13

The easiest way to do this would be to use get in a rowwise operation with dplyr:

library(dplyr)

dat %>% rowwise() %>%
    mutate(curr_value = get(`label#`)) %>%
    ungroup()

# A tibble: 3 × 5
  label1 label2 label3 `label#` curr_value
  <chr>  <chr>  <chr>  <chr>    <chr>     
1 value1 value4 value7 label2   value4    
2 value2 value5 value8 label1   value2    
3 value3 value6 value9 label3   value9

Park · Accepted Answer · 2021-12-14T08:09:45.500

0

It's bit messy and I think there might a better way, but you may try

library(dplyr)
library(tibble)
    
df <- read.table(text = "label1 label2  label3  label#
value1  value4  value7  label2
value2  value5  value8  label1
value3  value6  value9  label3", h = T)

df %>%
  rowwise %>%
  rownames_to_column(., "row") %>%
  mutate(currentvalue = .[[which(rownames(.) == row),which(names(.) == label)]])

  row   label1 label2 label3 label  currentvalue
  <chr> <chr>  <chr>  <chr>  <chr>  <chr>       
1 1     value1 value4 value7 label2 value4      
2 2     value2 value5 value8 label1 value2      
3 3     value3 value6 value9 label3 value9

When I read your data with read.table, label# become label.

column name `label#`

names(df)[4] <- "label#"

df %>%
  rowwise %>%
  rownames_to_column(., "row") %>%
  mutate(currentvalue = .[[which(rownames(.) == row),which(names(.) == 'label#')]])

  row   label1 label2 label3 `label#` currentvalue
  <chr> <chr>  <chr>  <chr>  <chr>    <chr>       
1 1     value1 value4 value7 label2   label2      
2 2     value2 value5 value8 label1   label1      
3 3     value3 value6 value9 label3   label3

using base R

x <- match(df$label, names(df))
y <- 1:nrow(df)
z <- data.frame(y, x)
df$currentvalue <- apply(z,1, function(x) df[x[1],x[2]])

Time check

microbenchmark::microbenchmark(
  a = {
    df %>%
      rowwise %>%
      rownames_to_column(., "row") %>%
      mutate(currentvalue = .[[which(rownames(.) == row),which(names(.) == label)]])
  },
  b = {
    x <- match(df$label, names(df))
    y <- 1:nrow(df)
    z <- data.frame(y, x)
    df$currentvalue <- apply(z,1, function(x) df[x[1],x[2]])
  }
)

Unit: microseconds
 expr    min      lq     mean  median     uq     max neval cld
    a 6157.8 6861.95 8773.098 7465.75 9367.1 26232.8   100   b
    b  360.6  399.75  692.073  488.40  666.9  4225.0   100  a

edited Dec 14 '21 at 08:09

answered Dec 14 '21 at 07:20

Park

14,771
6
10
29

Thanks for your time to answer! For some reason, this code seems to runs forever when I swap in my specific column name for label#/label. Do you know what might be causing this? – nlplearner Dec 14 '21 at 07:31
@nlplearner I cannot understand **swap in my specific column name for label#/label** part. I add the code when `label#` is 4th column name. – Park Dec 14 '21 at 07:34
I have edited my first comment so hopefully it is more clear. Here it is: – nlplearner Dec 14 '21 at 07:38
Thanks for your time to answer! For some reason, this code seems to runs forever when I change 'label#' to the name of my bigger dataset's column. Do you know what might be causing this? – nlplearner Dec 14 '21 at 07:39
@nlplearner Oh...Can you tell me the dimension of your new dataset? – Park Dec 14 '21 at 07:45
@nlplearner I should not use `rowwise` with that kind of data... – Park Dec 14 '21 at 07:51
@nlplearner I add new solution. It's about 15 times faster than my previous one. – Park Dec 14 '21 at 07:57
Thank you!! How do you decide what values to give object 'y' in this new solution? Should it be 1:nrow? – nlplearner Dec 14 '21 at 08:04
@nlplearner Yeap. That's right. – Park Dec 14 '21 at 08:08

score 0 · Answer 3 · answered Dec 14 '21 at 07:50

0

A solution using dplyr and purrr. imap_chr can apply a function efficiently through each row. The first argument is the content in label#, while the second argument is the row number.

Usually rowwise operation is slow when the data frame is huge, so try to avoid rowwise and use alternative if possible.

library(dplyr)
library(purrr)

dat2 <- dat %>%
  mutate(currentvalue = imap_chr(`label#`, ~dat[.y, .x]))
dat2
#   label1 label2 label3 label# currentvalue
# 1 value1 value4 value7 label2       value4
# 2 value2 value5 value8 label1       value2
# 3 value3 value6 value9 label3       value9

Data

dat <- read.table(text = "label1 label2  label3  label
value1  value4  value7  label2
value2  value5  value8  label1
value3  value6  value9  label3", header = TRUE) %>%
  setnames(c("label1", "label2", "label3", "label#"))

answered Dec 14 '21 at 07:50

www

38,575
12
48
84

Thank you for your time to respond! When I substitute in the df and column names from my larger data, I get the following error. Do you know how I can fix it? Error: Problem with `mutate()` column `currentvalue`. ℹ `currentvalue = imap_chr(wordstim, ~responsebtwn[.y, .x])`. x Can't coerce element 1 from a list to a character – nlplearner Dec 14 '21 at 07:58
@nlplearner Do you have duplicated column names in `responsebtwn`? – www Dec 14 '21 at 08:01
No, they take the form you see above but with a word other than "label" in front of each number – nlplearner Dec 14 '21 at 08:07
@nlplearner Do you have list columns in your data frame? – www Dec 14 '21 at 08:08
I think so, but I'm not sure how to convert them. I'll work on that now, thank you! – nlplearner Dec 14 '21 at 08:14

How do I create a column based on values in another column which are the names of variables in my dataframe whose data I want to fill newcol with? R

3 Answers3

column name label#

using base R

Time check

column name `label#`