summarize from two differents rows

Question

This is my starting df

test <- data.frame(year = c(2018,2018,2018,2018,2018), 
                    source = c("file1", "file1", "file1", "file1", "file1"),
                    area = c("000", "000", "800", "800", "800"),
                    cult2 = c("PBGEX", "QPGEX", "PBGEX", "QPGEX", "QPIND"), 
                    value = c(1000,2000,3000,4000,5000))

  year source area cult2 value
1 2018  file1  000 PBGEX  1000
2 2018  file1  000 QPGEX  2000
3 2018  file1  800 PBGEX  3000
4 2018  file1  800 QPGEX  4000
5 2018  file1  800 QPIND  5000

I need to get for each year/source/area the sum of value, for the fields PBGEX and QPGEX. I was thinking of using spread and gather but I m losing many others columns (not show here).

This what I would except :

  year source area cult2 value
1 2018  file1  000 PBGEX  1000
2 2018  file1  000 QPGEX  2000
3 2018  file1  800 PBGEX  3000
4 2018  file1  800 QPGEX  4000
5 2018  file1  800 QPIND  5000
6 2018  file1  000 RDGEX  3000
7 2018  file1  800 RDGEX  7000

akrun · Accepted Answer · 2020-12-19T19:58:06.553

We can filter the rows where the 'cult2' are 'QPGEX', 'PBGEX', then do a group_by sum and bind_rows with original dataset

library(dplyr)
test %>%
    filter(cult2 %in% c("QPGEX", "PBGEX")) %>% 
    group_by(year, source, area) %>%
    summarise(cult2 = "RDGEX", value = sum(value), .groups = 'drop') %>%
    bind_rows(test, .)

-output

#   year source area cult2 value
#1 2018  file1  000 PBGEX  1000
#2 2018  file1  000 QPGEX  2000
#3 2018  file1  800 PBGEX  3000
#4 2018  file1  800 QPGEX  4000
#5 2018  file1  800 QPIND  5000
#6 2018  file1  000 RDGEX  3000
#7 2018  file1  800 RDGEX  7000

If we need a proportion column

test %>%
 filter(cult2 %in% c("QPGEX", "PBGEX")) %>% 
 group_by(year, source, area) %>%
 group_by(prop = value[cult2== "QPGEX"]/value[cult2 == "PBGEX"],
        .add = TRUE) %>% 
 summarise(cult2 = "RDGEX", value = sum(value), .groups = 'drop') %>% 
 bind_rows(test, .)

Or it can be also

library(tidyr)
test %>% 
   filter(cult2 %in% c("QPGEX", "PBGEX")) %>%
   pivot_wider(names_from = cult2, values_from = value) %>% 
   # or use spread
   #spread(cult2, value) %>%
   mutate(prop = QPGEX/PBGEX) %>% 
   select(-PBGEX, -QPGEX) %>%
   right_join(test)

-output

# A tibble: 5 x 6
#   year source area   prop cult2 value
#  <dbl> <chr>  <chr> <dbl> <chr> <dbl>
#1  2018 file1  000    2    PBGEX  1000
#2  2018 file1  000    2    QPGEX  2000
#3  2018 file1  800    1.33 PBGEX  3000
#4  2018 file1  800    1.33 QPGEX  4000
#5  2018 file1  800    1.33 QPIND  5000

Thanks this is very good, but I also need the same calculation with a division instead of a sum, the result should be the result of QPGEX/PBGEX...Maybe you have an idea how to do this ? — krifur, Dec 19 '20 at 19:50
oups sorry forget to say, my R version is not up to date, I can't use pivot_wider :/edit: there is already two solutions, ok I m checking this, thanks again for help — krifur, Dec 19 '20 at 19:56
@krifur you can change it to `spread`, updated with a commented line — akrun, Dec 19 '20 at 19:58

summarize from two differents rows

1 Answers1