Summarize the lowest values in a Dataframe?

Question

My data frame looks like this:

View(df)
Product     Value
  a           2
  b           4 
  c           3
  d           10
  e           15
  f           5
  g           6
  h           4
  i           50
  j           20
  k           35
  l           25
  m           4
  n           6
  o           30
  p           4
  q           40
  r           5
  s           3
  t           40

I want to find the 9 most expensive products and summaries the rest. It should look like this:

Product     Value 
  d           10
  e           15
  i           50
  j           20
  k           35
  l           25
  o           30
  q           40
  t           40
 rest         46

Rest is the sum of the other 11 products. I tried it with summaries, but it didn't work:

new <- df %>%
  group_by(Product)%>%
summarise((Value > 10) = sum(Value)) %>%
  ungroup()

score 2 · Accepted Answer · answered Mar 04 '17 at 01:54

We can use dplyr::row_number to effectively rank the observations after using arrange to order the data by Value. Then, we augment the Product column so that any values that aren't in the top 9 are coded as Rest. Finally, we group by the updated Product and take the sum using summarise

dat %>%
    arrange(desc(Value)) %>%
    mutate(RowNum = row_number(),
           Product = ifelse(RowNum <= 9, Product, 'Rest')) %>%
    group_by(Product) %>%
    summarise(Value = sum(Value))

# A tibble: 10 × 2
   Product Value
     <chr> <int>
1        d    10
2        e    15
3        i    50
4        j    20
5        k    35
6        l    25
7        o    30
8        q    40
9     Rest    46
10       t    40

data

dat <- structure(list(Product = c("a", "b", "c", "d", "e", "f", "g", 
"h", "i", "j", "k", "l", "m", "n", "o", "p", "q", "r", "s", "t"
), Value = c(2L, 4L, 3L, 10L, 15L, 5L, 6L, 4L, 50L, 20L, 35L, 
25L, 4L, 6L, 30L, 4L, 40L, 5L, 3L, 40L)), .Names = c("Product", 
"Value"), class = "data.frame", row.names = c(NA, -20L))

score 1 · Answer 2 · answered Mar 04 '17 at 03:23

Another way with dplyr would be to create the outcome with do. The code becomes a bit hard to read since you need to use .$, yet you can avoid ifelse/if_else. After arranging the order by Value, you can create two vectors. One with the first nine product names and "rest". The other with the first nine values and the sum of the value of the other values. You directly create a data frame using do.

df %>%
arrange(desc(Value)) %>%
do(data.frame(Product = c(as.character(.$Product[1:9]), "Rest"),
              Value = c(.$Value[1:9], sum(.$Value[10:length(.$Value)]))))

#   Product Value
#1        i    50
#2        q    40
#3        t    40
#4        k    35
#5        o    30
#6        l    25
#7        j    20
#8        e    15
#9        d    10
#10    Rest    46

score 1 · Answer 3 · answered Mar 04 '17 at 05:49

Here is one option using data.table

library(data.table)
setDT(df)[, i1 := .I][order(desc(Value))
          ][-(seq_len(9)), Product := 'rest'
           ][, .(Value = sum(Value), i1=i1[1L]), Product
           ][order(Product=='rest', i1)][, i1 := NULL][]
#    Product Value
#1:       d    10
#2:       e    15
#3:       i    50
#4:       j    20
#5:       k    35
#6:       l    25
#7:       o    30
#8:       q    40
#9:       t    40
#10:   rest    46

Summarize the lowest values in a Dataframe?

3 Answers3

data