2

I have a dataframe with multiple columns where its rowSums is either 1 either 0.9. If one column is 0.5, than another one has to be 0.5. If a column is 0.3, then other two have to have the same value.

df <- data.frame(A = c(0, 0, 0.3, 0.5, 0, 0.3, 0.5), B = c(0, 0, 0.3, 0.5, 0, 0.3, 0.5), 
C = c(1, 1, 0.3, 0, 1, 0.3, 0))

What I need in the end is another column (result) that has the column names where the values > 0.

> df
    A   B   C result
1 0.0 0.0 1.0      C
2 0.0 0.0 1.0      C
3 0.3 0.3 0.3  A-B-C
4 0.5 0.5 0.0    A-B
5 0.0 0.0 1.0      C
6 0.3 0.3 0.3  A-B-C
7 0.5 0.5 0.0    A-B

Thanks!

Andrei Niță
  • 517
  • 1
  • 3
  • 14
  • 1
    A bit faster base base R option could be `indx <- which(df > 0, arr.ind = TRUE) ; df$result <- tapply(names(df)[indx[, "col"]], indx[, "row"], toString)` – David Arenburg Mar 26 '20 at 14:21

2 Answers2

2

You can do:

df$result <- apply(df, 1, function(x) paste(names(df)[x > 0], collapse = "-"))

df
    A   B   C result
1 0.0 0.0 1.0      C
2 0.0 0.0 1.0      C
3 0.3 0.3 0.3  A-B-C
4 0.5 0.5 0.0    A-B
5 0.0 0.0 1.0      C
6 0.3 0.3 0.3  A-B-C
7 0.5 0.5 0.0    A-B
Ritchie Sacramento
  • 29,890
  • 4
  • 48
  • 56
1

library(tidyverse)

df %>% 
mutate(id = row_number()) %>% 
pivot_longer(-id, names_to = "cd", values_to = "vals") %>% 
filter(vals > 0) %>% 
group_by(id) %>% 
summarise(new_val = paste(cd, collapse = "-")) %>% 
ungroup() %>% 
cbind(df, .)
akash87
  • 3,876
  • 3
  • 14
  • 30