3

I have a data frame:

df <- data.frame(count=c(0,1,2,3,4,5,6), value=c(100,50,60,70,2,6,8))

  count value
1     0   100
2     1    50
3     2    60
4     3    70
5     4     2
6     5     6
7     6     8

How do I sum value larger than "n" into one row? So for example, if I choose n = 3 then I want to have:

  count value
1     0   100
2     1    50
3     2    60
4     3    70
5    >3    16
ThomasIsCoding
  • 96,636
  • 9
  • 24
  • 81
Algorithman
  • 1,309
  • 1
  • 16
  • 39

4 Answers4

4

Coerce the count to factor with everything above 3 collapsed into ">3". Then aggregate the values by count.

df$count <- factor(ifelse(df$count > 3, ">3", df$count), levels = c(1:3, ">3"))
aggregate(value ~ count, df, sum)
#  count value
#1     0   100
#2     1    50
#3     2    60
#4     3    70
#5    >3    16

R 4.1.0 or above.

Starting with R 4.1.0, there are a new pipe operator and a new lambda, that can be used if the column count is to be kept as is, meaning, if the transformation is temporary only.

df |>
  within(count <- factor(ifelse(count > 3, ">3", count), levels = c(1:3, ">3"))) |>
  (\(x)aggregate(value ~ count, x, sum))()
#  count value
#1     0   100
#2     1    50
#3     2    60
#4     3    70
#5    >3    16
Rui Barradas
  • 70,273
  • 8
  • 34
  • 66
4

We can use

library(dplyr)
 df %>% 
    group_by(count = case_when(count >3 ~ '>3', 
     TRUE ~ as.character(count))) %>% 
    summarise(value = sum(value), .groups = 'drop')
akrun
  • 874,273
  • 37
  • 540
  • 662
3

Another base R option using aggregate

transform(
  aggregate(
    . ~ count,
    transform(df, count = replace(count, count > 3, Inf)),
    sum
  ),
  count = replace(count, is.infinite(count), ">3")
)

gives

  count value
1     0   100
2     1    50
3     2    60
4     3    70
5    >3    16
ThomasIsCoding
  • 96,636
  • 9
  • 24
  • 81
2

Here is a dplyr solution using replace. The downside is, it needs to be arranged if >3 should be the last line (otherwise it'd be pretty concise).

library(dplyr)

df %>% 
  group_by(count = replace(count, count > 3, ">3")) %>% 
  summarise(value = sum(value)) %>% 
  arrange(count == ">3")
#> # A tibble: 5 x 2
#>   count value
#>   <chr> <dbl>
#> 1 0       100
#> 2 1        50
#> 3 2        60
#> 4 3        70
#> 5 >3       16

Created on 2021-08-26 by the reprex package (v0.3.0)

TimTeaFan
  • 17,549
  • 4
  • 18
  • 39