2

i have a df structured like this:

 Ateco. Numb. Reg

   10    223   A
   11    332   A
   12    343   A
   10    223   B
   11    332   B
   12    343   B
   29    414   B
   30    434   B
   31    444   B
   32    464   B

and I want to obtain another df, where numb is the sum of the Ateco values that I select.

 Ateco.     Numb.  Reg

10_11_12    898     A
10_11_12    898     B      
   29       414     B
   30       434     B
   31       444     B
   32       464     B

how could I do?

Silvia
  • 405
  • 4
  • 17
  • Try `aggregate(Ateco. ~ ., df, FUN = paste, collapse="_")` or using `dplyr` `df %>% group_by(Numb., Reg) %>% summarie(Ateco. = paste(Ateco., collapse="_"))` or `data.table` `setDT(df)[, .(Ateco. = paste(Ateco., collapse="_")), by = .(Numb., Reg)]` – akrun Feb 04 '19 at 15:51
  • yes but I have to declare which are the Ateco values that I want to aggregate – Silvia Feb 04 '19 at 15:53
  • In your example, 10, 11, 12 are the only values, so it should automatically get pasted – akrun Feb 04 '19 at 15:54
  • you're right. I made some changes – Silvia Feb 04 '19 at 15:55
  • Is there any other condition that specifies 10:12 in one group or not? – akrun Feb 04 '19 at 16:05
  • It is not clear to me which way you need to group the 'Ateco.' values. Is there any logical conditon? IN other words, how do you decide that 10:12 would b in a group – akrun Feb 04 '19 at 16:13
  • No, it is completely subjective. Because I need to aggregate that variables only, but there is no logical condition. Only my necessity – Silvia Feb 04 '19 at 16:14
  • Let me know if you have any more updates with the data – akrun Feb 04 '19 at 16:23
  • Of course. I’m sorry but I have to leave. I answer you tomorrow morning. Thank you very much for you suggestions and availability – Silvia Feb 04 '19 at 16:24

1 Answers1

2

Based on the updated input example, grouped by 'Reg' and the presence of values 10 to 12 in 'Ateco.', get the sum of 'Numb.' and paste the 'Ateco', elements, ungroup and remove the 'grp' if needed

library(tidyverse)
df %>% 
   group_by(Reg, grp = Ateco. %in%  10:12) %>% 
   summarise(Numb. = sum(Numb.), 
            Ateco. = paste(Ateco., collapse="_")) %>%
   ungroup %>%
   select(-grp)
# A tibble: 3 x 3
#  Reg   Numb. Ateco.  
#  <chr> <int> <chr>   
#1 A       898 10_11_12
#2 B       414 29      
#3 B       898 10_11_12

If we assume the 'grp' is created based on the occurence of 'Ateco' values in both the 'Reg' elements

df %>% 
    group_by(Ateco.) %>%
    group_by(grp = n_distinct(Reg) > 1, Reg) %>% 
    summarise(Numb. = sum(Numb.),
             Ateco. = paste(Ateco., collapse="_")) %>%
    ungroup %>%
    select(-grp)

Update

Based on the new dataset

df2 %>% 
    group_by(Ateco. = case_when(Ateco. %in% 10:12 ~ '10_11_12', 
                   TRUE ~ as.character(Ateco.)), Reg) %>% 
    summarise(Numb. = sum(Numb.))
# A tibble: 6 x 3
# Groups:   Ateco. [?]
#  Ateco.   Reg   Numb.
#  <chr>    <chr> <int>
#1 10_11_12 A       898
#2 10_11_12 B       898
#3 29       B       414
#4 30       B       434
#5 31       B       444
#6 32       B       464

data

df <- structure(list(Ateco. = c(10L, 11L, 12L, 10L, 11L, 12L, 29L), 
Numb. = c(223L, 332L, 343L, 223L, 332L, 343L, 414L), Reg = c("A", 
"A", "A", "B", "B", "B", "B")), class = "data.frame", row.names = c(NA, 
 -7L))

df2 <- structure(list(Ateco. = c(10L, 11L, 12L, 10L, 11L, 12L, 29L, 
 30L, 31L, 32L), Numb. = c(223L, 332L, 343L, 223L, 332L, 343L, 
 414L, 434L, 444L, 464L), Reg = c("A", "A", "A", "B", "B", "B", 
 "B", "B", "B", "B")), class = "data.frame", row.names = c(NA, 
 -10L))
akrun
  • 874,273
  • 37
  • 540
  • 662