1

I've seen several threads on solutions to this, but I am struggling implementing them. I have a df with columns across the top with descriptions, and then I have a list of samples with data that are grouped by description. I need to extract the values where the descriptions match the column names.

I have tried different solutions using match, cbind, sapply...etc, but get errors about an invalid type(matrix) or having duplicate row names.

 df1
 #row   description    sample    ball   square    circle
 1      ball           1a        .78      .04      .22
 2      ball           7b3       .32      .33      .33
 3      square         aaabc     .02      .90      .05
 4      circle         ggg3      .05      .04      .90
 5      circle         44        .01      .25      .70

My output would be:

 df2
 #row   description    sample    value
 1      ball           1a        .78      
 2      ball           7b3       .32      
 3      square         aaabc     .90      
 4      circle         ggg3      .90
 5      circle         44        .70

And then taking that one step further, I would then filter it

 df2 %>%
 filter(value < .9) %>%
 select(description, sample, value)

Resulting in:

 #row   description    sample    value
 1      ball           1a        .78      
 2      ball           7b3       .32      
 3      circle         44        .70

I know this is a duplicate, I'm just drawing a blank as to why I can't get the solutions to work with this data set.

Steve
  • 588
  • 4
  • 17
  • Possible duplicate https://stackoverflow.com/questions/54510225/how-to-use-column-value-as-column-name/ and might be helpful. – Ronak Shah Feb 04 '19 at 14:59

2 Answers2

2

We can use a row/column indexing to extract the values that match the column names with the 'description' column values

m1 <- cbind(seq_len(nrow(df1)), match(df1$description, names(df1)[3:5]))
data.frame(df1[1:3], value = df1[3:5][m1])
#  description sample ball value
#1        ball     1a 0.78  0.78
#2        ball    7b3 0.32  0.32
#3      square  aaabc 0.02  0.90
#4      circle   ggg3 0.05  0.90
#5      circle     44 0.01  0.70

Or with tidyverse

library(tidyverse)
df1 %>% 
   rowwise %>% 
   transmute(description, sample, value = get(description))
# A tibble: 5 x 3
#  description sample value
#  <chr>       <chr>  <dbl>
#1 ball        1a      0.78
#2 ball        7b3     0.32
#3 square      aaabc   0.9 
#4 circle      ggg3    0.9 
#5 circle      44      0.7 

data

df1 <- structure(list(description = c("ball", "ball", "square", "circle", 
 "circle"), sample = c("1a", "7b3", "aaabc", "ggg3", "44"), ball = c(0.78, 
 0.32, 0.02, 0.05, 0.01), square = c(0.04, 0.33, 0.9, 0.04, 0.25
 ), circle = c(0.22, 0.33, 0.05, 0.9, 0.7)), class = "data.frame", 
  row.names = c("1", 
  "2", "3", "4", "5"))
akrun
  • 874,273
  • 37
  • 540
  • 662
  • Thankyou! The cbind solution gave me the same issues that i was already having."i is invalid type (matrix). Perhaps in future a 2 column matrix could return a list of elements of DT (in the spirit of A[B] in FAQ 2.14). Please report to data.table issue tracker if you'd like this, or add your comments to FR #657." But the tidyverse solution is perfect! – Steve Feb 04 '19 at 18:52
  • @Steve Looks like you have a `data.table`. In that case, you need `setDF(df1)` and then use the same code or subset the columns using `with = FALSE` – akrun Feb 05 '19 at 01:25
0

It seems that you have percentages of likelihood. So, you are basically trying to extract the column with the highest likelihood of occurrence, something like extracting the maximum value per row among those 3 rows. So:

First we create a function to extract the max per row among 3 columns

    funcionMax <- function(unDf) {
  numFilas <- nrow(unDf)
  vectorMax <- vector()

  for(i in 1:numFilas)
  {
    vectorMax[i]<- max(unDf[i,1],unDf[i,2],unDf[i,3])

  }
  vectorMax
}

Then, we subset to deal only with those 3 columns, and apply the new function:

vectorFuncionMax <- df %>% select(ball,square,circle) %>% funcionMax
cbind(df,vectorFuncionMax)

And that's it. You are welcome.

Jorge Lopez
  • 467
  • 4
  • 10
  • Thanks for the answer! Unfortunately I am not interested in the highest value in each row. (Row 2 has a higher value in a different description.) – Steve Feb 04 '19 at 18:50
  • If that's the case then another way that sounds logical to me to do it is to use gather from dplyr to get the column names into a single column and creating a df with that. Then using merge to get the corresponding values. – Jorge Lopez Feb 04 '19 at 22:48