2

I have a table like below, would like to crate suggestions based on row value in R studio.

This is what I have -

id class1 class2 class3 class4 top1 top2 top3
A 0.98 0.48 0.21 0.99 0.99 0.98 0.48
B 0.22 0.31 0.41 0.11 0.41 0.31 0.22
C 0.70 0.81 0.61 0.21 0.81 0.70 0.61

I would like to have names of the column for top1 top2 top3 as well.

id class1 class2 class3 class4 top1 top2 top3 top1name top2name top3name
A 0.98 0.48 0.21 0.99 0.99 0.98 0.48 class4 class1 class2
B 0.22 0.31 0.41 0.11 0.41 0.31 0.22 class3 class2 class1
C 0.70 0.81 0.61 0.21 0.81 0.70 0.61 class2 class1 class3

Sample data:

I have a table like below, would like to crate suggestions based on row value in R studio.

This is what I have :

id A B C D E F G
A 0.98 0.48 0.21 0.97 0.47 0.20 0.19
B 0.22 0.31 0.41 0.11 0.42 0.32 0.23
C 0.70 0.81 0.61 0.21 0.82 0.71 0.62

I would like to have names of the column for top1 top2 top3 as well.

id A B C D E F G top1name top2name top3name top4name top5name top6name top7name
A 0.98 0.48 0.21 0.97 0.47 0.20 A D B E C F
B 0.22 0.31 0.41 0.11 0.42 0.32 0.23 E C F B G A D
C 0.70 0.81 0.61 0.21 0.82 0.71 0.62 E B F A G C D
rgambhava
  • 47
  • 5
  • 1
    Does this answer your question? [dplyr mutate rowwise max of range of columns](https://stackoverflow.com/questions/32978458/dplyr-mutate-rowwise-max-of-range-of-columns) – Limey Aug 03 '21 at 07:40
  • 1
    No, it does not. I want names of the column w.r.t top1 column. – rgambhava Aug 03 '21 at 07:55

1 Answers1

2

Here is an approach using dplyr::rowwise with a custom function to generate three new columns containing the number of top classes. Ties between values are captured and displayed in ascending order of class names.

library(dplyr)

get_top3_classes <- function(d) {

  r <- rank(-unlist(d), ties.method = "last")
  out <- names(sort(r))[1:3] 
  out <- gsub("class", "", out) # to get column class index
  m <- matrix(out,
              ncol = 3,
              dimnames = list(NULL, paste0("top", 1:3, "class"))
              )
  as_tibble(m)
  
}

dat %>% 
  rowwise() %>% 
  mutate(get_top3_classes(
            across(matches("class"))
            )
         ) %>% 
  glimpse # for printing

#> Rows: 3
#> Columns: 11
#> Rowwise: 
#> $ id        <chr> "A", "B", "C"
#> $ class1    <dbl> 0.98, 0.22, 0.70
#> $ class2    <dbl> 0.48, 0.31, 0.81
#> $ class3    <dbl> 0.21, 0.41, 0.61
#> $ class4    <dbl> 0.99, 0.11, 0.21
#> $ top1      <dbl> 0.99, 0.41, 0.81
#> $ top2      <dbl> 0.98, 0.31, 0.70
#> $ top3      <dbl> 0.48, 0.22, 0.61
#> $ top1class <chr> "4", "3", "2"
#> $ top2class <chr> "1", "2", "1"
#> $ top3class <chr> "2", "1", "3"

# data containing a tie in the first row
dat <- data.frame(
  stringsAsFactors = FALSE,
  id = c("A", "B", "C"),
  class1 = c(0.98, 0.22, 0.7),
  class2 = c(0.48, 0.31, 0.81),
  class3 = c(0.21, 0.41, 0.61),
  class4 = c(0.99, 0.11, 0.21),
  top1 = c(0.99, 0.41, 0.81),
  top2 = c(0.98, 0.31, 0.7),
  top3 = c(0.48, 0.22, 0.61)
)

Created on 2021-08-04 by the reprex package (v0.3.0)

Update:

Here is a solution for the updated question:

library(dplyr)

get_top3_classes <- function(d) {
  
  r <- rank(-unlist(d), ties.method = "last")
  out <- names(sort(r)) 
  out <- gsub("class", "", out) # to get column class index
  m <- matrix(out,
              ncol = length(out),
              dimnames = list(NULL, paste0("top", seq(length(out)), "names"))
  )
  as_tibble(m)
  
}

dat %>% 
  rowwise(id) %>% 
  mutate(get_top3_classes(
    cur_data()
  )
  ) %>% 
  glimpse # for printing

#> Rows: 3
#> Columns: 15
#> Rowwise: id
#> $ id        <chr> "A", "B", "C"
#> $ A         <dbl> 0.98, 0.22, 0.70
#> $ B         <dbl> 0.48, 0.31, 0.81
#> $ C         <dbl> 0.21, 0.41, 0.61
#> $ D         <dbl> 0.97, 0.11, 0.21
#> $ E         <dbl> 0.47, 0.42, 0.82
#> $ F         <dbl> 0.20, 0.32, 0.71
#> $ G         <dbl> 0.19, 0.23, 0.62
#> $ top1names <chr> "A", "E", "E"
#> $ top2names <chr> "D", "C", "B"
#> $ top3names <chr> "B", "F", "F"
#> $ top4names <chr> "E", "B", "A"
#> $ top5names <chr> "C", "G", "G"
#> $ top6names <chr> "F", "A", "C"
#> $ top7names <chr> "G", "D", "D"

# data 
dat <- tibble::tribble(
  ~id,   ~A,   ~B,   ~C,   ~D,   ~E,   ~F,   ~G,
  "A", 0.98, 0.48, 0.21, 0.97, 0.47,  0.2, 0.19,
  "B", 0.22, 0.31, 0.41, 0.11, 0.42, 0.32, 0.23,
  "C",  0.7, 0.81, 0.61, 0.21, 0.82, 0.71, 0.62
)

Created on 2021-08-16 by the reprex package (v2.0.1)

TimTeaFan
  • 17,549
  • 4
  • 18
  • 39
  • Hi, Instead of {matches("class")} ,can I give column index ?? – rgambhava Aug 03 '21 at 09:15
  • how do you want to deal with cases where you have the same value in more than one column? -based on alphabetical order of column name – rgambhava Aug 03 '21 at 09:16
  • @rgambhava this makes this problem quite more difficult. I will work on it later. – TimTeaFan Aug 03 '21 at 11:23
  • @rgambhava: I updated my answer. I hope I understood you correctly in that you are only interested in the class index and not the full class names. – TimTeaFan Aug 03 '21 at 22:30
  • hi @TimTeaFan, as I am increasing the classes to class 5,6....7. its is not working Could you please help? – rgambhava Aug 16 '21 at 12:27
  • @rgambhava: Did you adjust the `ncol` and the `dimnames` argument in the `matrix` call`? If you add some sample data to your question it should be easy to help you adusting the function. – TimTeaFan Aug 16 '21 at 12:32
  • Yes, I did. I am adding some sample data. Function i used:` get_top3_classes <- function(d) { r <- rank(-unlist(d), ties.method = "last") out <- names(sort(r))[1:7] out <- gsub("class", "", out) # to get column class index m <- matrix(out, ncol = 7, dimnames = list(NULL, paste0("top", 1:7, "class")) ) as_tibble(m) }` – rgambhava Aug 16 '21 at 12:55
  • @rgambhava: I update my answer. Please upvote if it solves your problem. – TimTeaFan Aug 16 '21 at 13:16
  • It is working. But it showing column names for null values as well. Is is possible to ignore null values present in column "A"..to .. "G"? – rgambhava Aug 16 '21 at 13:40
  • ACCNT_ID A B C D E F G top1names top2names top3names top4names top5names top6names top7names 1 289739509 89.86 NA NA 50.82 NA 5.52 NA A D F B C E G 2 698811605 79.55 NA NA 32.02 19.85 2.65 23.16 A D G E F B C – rgambhava Aug 16 '21 at 13:42
  • 1
    hi @TimTeaFan, I figured it out. Thank you so much for the help. – rgambhava Aug 16 '21 at 13:58