2

I want to aggregate my data as follows:

  • Aggregate only for successive rows where status = 0
  • Keep age and sum up points

Example data:

da <- data.frame(userid = c(1,1,1,1,2,2,2,2), status = c(0,0,0,1,1,1,0,0), age = c(10,10,10,11,15,16,16,16), points = c(2,2,2,6,3,5,5,5))

da
  userid status age points
1      1      0  10      2
2      1      0  10      2
3      1      0  10      2
4      1      1  11      6
5      2      1  15      3
6      2      1  16      5
7      2      0  16      5
8      2      0  16      5

I would like to have:


da2
  userid status age points
1      1      0  10      6
2      1      1  11      6
3      2      1  15      3
4      2      1  16      5
5      2      0  16     10
Scijens
  • 541
  • 2
  • 11

3 Answers3

2
da %>%
    mutate(grp = with(rle(status),
                      rep(seq_along(values), lengths)) + cumsum(status != 0)) %>%
    group_by_at(vars(-points)) %>%
    summarise(points = sum(points)) %>%
    ungroup() %>%
    select(-grp)
## A tibble: 5 x 4
#  userid status   age points
#   <dbl>  <dbl> <dbl>  <dbl>
#1      1      0    10      6
#2      1      1    11      6
#3      2      0    16     10
#4      2      1    15      3
#5      2      1    16      5
d.b
  • 32,245
  • 6
  • 36
  • 77
1

You can use group_by from dplyr:

da %>% group_by(da$userid, cumsum(da$status), da$status) 
   %>% summarise(age=max(age), points=sum(points))

Output:

  `da$userid` `cumsum(da$status)` `da$status`   age points
        <dbl>               <dbl>       <dbl> <dbl>  <dbl>
1           1                   0           0    10      6
2           1                   1           1    11      6
3           2                   2           1    15      3
4           2                   3           0    16     10
5           2                   3           1    16      5
Quang Hoang
  • 146,074
  • 10
  • 56
  • 74
1

Exactly the same idea as above :

library(dplyr)

data1 <- data %>% group_by(userid, age, status) %>%
  filter(status == 0) %>%
  summarise(points = sum(points))

data2 <- data %>%
  group_by(userid, age, status) %>%
  filter(status != 0) %>%
  summarise(points = sum(points))

data <- rbind(data1,
              data2)

We need to be more carreful with your specification of status equal to 0. I think the code of Quang Hoang works only for your specific example.

I hope it will help.

Rémi Coulaud
  • 1,684
  • 1
  • 8
  • 19