0

My data frame looks like this:

View(df)
Product     Value
  a           2
  b           4 
  c           3
  d           10
  e           15
  f           5
  g           6
  h           4
  i           50
  j           20
  k           35
  l           25
  m           4
  n           6
  o           30
  p           4
  q           40
  r           5
  s           3
  t           40

I want to find the 9 most expensive products and summaries the rest. It should look like this:

Product     Value 
  d           10
  e           15
  i           50
  j           20
  k           35
  l           25
  o           30
  q           40
  t           40
 rest         46

Rest is the sum of the other 11 products. I tried it with summaries, but it didn't work:

new <- df %>%
  group_by(Product)%>%
summarise((Value > 10) = sum(Value)) %>%
  ungroup()
ekad
  • 14,436
  • 26
  • 44
  • 46
Marre
  • 93
  • 1
  • 8

3 Answers3

2

We can use dplyr::row_number to effectively rank the observations after using arrange to order the data by Value. Then, we augment the Product column so that any values that aren't in the top 9 are coded as Rest. Finally, we group by the updated Product and take the sum using summarise

dat %>%
    arrange(desc(Value)) %>%
    mutate(RowNum = row_number(),
           Product = ifelse(RowNum <= 9, Product, 'Rest')) %>%
    group_by(Product) %>%
    summarise(Value = sum(Value))

# A tibble: 10 × 2
   Product Value
     <chr> <int>
1        d    10
2        e    15
3        i    50
4        j    20
5        k    35
6        l    25
7        o    30
8        q    40
9     Rest    46
10       t    40

data

dat <- structure(list(Product = c("a", "b", "c", "d", "e", "f", "g", 
"h", "i", "j", "k", "l", "m", "n", "o", "p", "q", "r", "s", "t"
), Value = c(2L, 4L, 3L, 10L, 15L, 5L, 6L, 4L, 50L, 20L, 35L, 
25L, 4L, 6L, 30L, 4L, 40L, 5L, 3L, 40L)), .Names = c("Product", 
"Value"), class = "data.frame", row.names = c(NA, -20L))
bouncyball
  • 10,631
  • 19
  • 31
1

Another way with dplyr would be to create the outcome with do. The code becomes a bit hard to read since you need to use .$, yet you can avoid ifelse/if_else. After arranging the order by Value, you can create two vectors. One with the first nine product names and "rest". The other with the first nine values and the sum of the value of the other values. You directly create a data frame using do.

df %>%
arrange(desc(Value)) %>%
do(data.frame(Product = c(as.character(.$Product[1:9]), "Rest"),
              Value = c(.$Value[1:9], sum(.$Value[10:length(.$Value)]))))

#   Product Value
#1        i    50
#2        q    40
#3        t    40
#4        k    35
#5        o    30
#6        l    25
#7        j    20
#8        e    15
#9        d    10
#10    Rest    46
jazzurro
  • 23,179
  • 35
  • 66
  • 76
1

Here is one option using data.table

library(data.table)
setDT(df)[, i1 := .I][order(desc(Value))
          ][-(seq_len(9)), Product := 'rest'
           ][, .(Value = sum(Value), i1=i1[1L]), Product
           ][order(Product=='rest', i1)][, i1 := NULL][]
#    Product Value
#1:       d    10
#2:       e    15
#3:       i    50
#4:       j    20
#5:       k    35
#6:       l    25
#7:       o    30
#8:       q    40
#9:       t    40
#10:   rest    46
akrun
  • 874,273
  • 37
  • 540
  • 662