Sum values according to specific condition in R

Question

i have a df structured like this:

 Ateco. Numb. Reg

   10    223   A
   11    332   A
   12    343   A
   10    223   B
   11    332   B
   12    343   B
   29    414   B
   30    434   B
   31    444   B
   32    464   B

and I want to obtain another df, where numb is the sum of the Ateco values that I select.

 Ateco.     Numb.  Reg

10_11_12    898     A
10_11_12    898     B      
   29       414     B
   30       434     B
   31       444     B
   32       464     B

how could I do?

Try `aggregate(Ateco. ~ ., df, FUN = paste, collapse="_")` or using `dplyr` `df %>% group_by(Numb., Reg) %>% summarie(Ateco. = paste(Ateco., collapse="_"))` or `data.table` `setDT(df)[, .(Ateco. = paste(Ateco., collapse="_")), by = .(Numb., Reg)]` — akrun, Feb 04 '19 at 15:51
yes but I have to declare which are the Ateco values that I want to aggregate — Silvia, Feb 04 '19 at 15:53
In your example, 10, 11, 12 are the only values, so it should automatically get pasted — akrun, Feb 04 '19 at 15:54
Is there any other condition that specifies 10:12 in one group or not? — akrun, Feb 04 '19 at 16:05
It is not clear to me which way you need to group the 'Ateco.' values. Is there any logical conditon? IN other words, how do you decide that 10:12 would b in a group — akrun, Feb 04 '19 at 16:13
No, it is completely subjective. Because I need to aggregate that variables only, but there is no logical condition. Only my necessity — Silvia, Feb 04 '19 at 16:14
Of course. I’m sorry but I have to leave. I answer you tomorrow morning. Thank you very much for you suggestions and availability — Silvia, Feb 04 '19 at 16:24

akrun · Accepted Answer · 2019-02-04T16:16:22.160

Based on the updated input example, grouped by 'Reg' and the presence of values 10 to 12 in 'Ateco.', get the sum of 'Numb.' and paste the 'Ateco', elements, ungroup and remove the 'grp' if needed

library(tidyverse)
df %>% 
   group_by(Reg, grp = Ateco. %in%  10:12) %>% 
   summarise(Numb. = sum(Numb.), 
            Ateco. = paste(Ateco., collapse="_")) %>%
   ungroup %>%
   select(-grp)
# A tibble: 3 x 3
#  Reg   Numb. Ateco.  
#  <chr> <int> <chr>   
#1 A       898 10_11_12
#2 B       414 29      
#3 B       898 10_11_12

If we assume the 'grp' is created based on the occurence of 'Ateco' values in both the 'Reg' elements

df %>% 
    group_by(Ateco.) %>%
    group_by(grp = n_distinct(Reg) > 1, Reg) %>% 
    summarise(Numb. = sum(Numb.),
             Ateco. = paste(Ateco., collapse="_")) %>%
    ungroup %>%
    select(-grp)

Update

Based on the new dataset

df2 %>% 
    group_by(Ateco. = case_when(Ateco. %in% 10:12 ~ '10_11_12', 
                   TRUE ~ as.character(Ateco.)), Reg) %>% 
    summarise(Numb. = sum(Numb.))
# A tibble: 6 x 3
# Groups:   Ateco. [?]
#  Ateco.   Reg   Numb.
#  <chr>    <chr> <int>
#1 10_11_12 A       898
#2 10_11_12 B       898
#3 29       B       414
#4 30       B       434
#5 31       B       444
#6 32       B       464

data

df <- structure(list(Ateco. = c(10L, 11L, 12L, 10L, 11L, 12L, 29L), 
Numb. = c(223L, 332L, 343L, 223L, 332L, 343L, 414L), Reg = c("A", 
"A", "A", "B", "B", "B", "B")), class = "data.frame", row.names = c(NA, 
 -7L))

df2 <- structure(list(Ateco. = c(10L, 11L, 12L, 10L, 11L, 12L, 29L, 
 30L, 31L, 32L), Numb. = c(223L, 332L, 343L, 223L, 332L, 343L, 
 414L, 434L, 444L, 464L), Reg = c("A", "A", "A", "B", "B", "B", 
 "B", "B", "B", "B")), class = "data.frame", row.names = c(NA, 
 -10L))

It gives me problems if there are multiple values different from 10, 11, 12. I update the input df — Silvia, Feb 04 '19 at 16:10
It aggregates the values that I want to aggregate, but also all the others. — Silvia, Feb 04 '19 at 16:13

Sum values according to specific condition in R

1 Answers1

Update

data