-2

How can I group by "id", apply some arithmetic functions to the last four columns (by group), and add new rows to the df containing the results. Here is an illustrative example with 5 samples (id) and 8 columns:

    d1   d2   id  type         treat  v1_gm  v2_pct v3_pct
1   info info 1   leaf         NA     0.2    70     90
2   info info 1   flower       A      0.5    80     80
3   info info 2   leaf         NA     0.4    65     80
4   info info 2   flower       A      0.1    90     90
5   info info 3   leaf         NA     0.6    55     80
6   info info 3   stem         A      0.3    80     30
7   info info 4   leaf         NA     0.6    30     40
8   info info 4   flower       A      0.7    75     75
9   info info 5   leaf/stem    NA     0.8    80     75

Reproducible example:

df <- data.frame(matrix(NA, nrow = 9, ncol = 8), row.names=NULL)
colnames(df) <- c("d1","d2","id","type","treat","v1_gm","v2_pct","v3_pct")
df$d1 <- "info"
df$d2 <- "info"
id <- c(1,1,2,2,3,3,4,4,5)
df$id <- c(1,1,2,2,3,3,4,4,5)
df$type <- c("leaf","flower","leaf","flower","leaf","stem","leaf","flower","leaf/stem")
df$treat <- c(NA,"A",NA,"A",NA,"A",NA,"A",NA)
df$v1_gm <- c(0.2,0.5,0.4,0.1,0.6,0.3,0.6,0.7,0.8)
df$v2_pct <- c(70,80,65,90,55,80,30,75,80)
df$v3_pct <- c(90,80,80,90,80,30,40,75,75)

The result table should look something like the following. Rows 3, 6, 9, and 13 are the new rows containing the results. The new rows can be appended at the end of the table, or put in a tmp df to be added later with rbind (I can't figure out how to do it, either way). The grouping var is column "id". The function sum is used for "v1_gm". The function "mean" is used for multiple consecutive columns, here "v1_pct" and "v3_pct", that should be called by name (e.g., v1_pct:v3_pct). The value for "type" in the new rows is concatenated from "type" in the group rows, "d1" and "d2" are simply copied from the group row where treat=="A", and "treat" in the new row is assigned the value "cmb".

    d1   d2   id type         treat v1_gm  v2_pct  v3_pct
1   info info 1  leaf         NA    0.2    70      90
2   info info 1  flower       A     0.5    80      80
3   info info 1  leaf/flower  cmb   0.7    75      85
4   info info 2  leaf         NA    0.4    65      80
5   info info 2  flower       A     0.1    90      90
6   info info 2  leaf/flower  cmb   0.5    77.5    85
7   info info 3  leaf         NA    0.6    55      80
8   info info 3  stem         A     0.3    80      30
9   info info 3  leaf/stem    cmb   0.9    67.5    55
10  info info 4  leaf         NA    0.6    30      40
11  info info 4  flower       A     0.7    75      75
13  info info 4  leaf/flower  cmb   1.3    52.5    57.5
14  info info 5  leaf/stem    NA    0.8    80      75
Dave Stumped
  • 319
  • 1
  • 3
  • 12

2 Answers2

1

I'm unsure if you can add the group summaries as a row to the dataframe. You should be able to do it as a column.

library("dplyr")
res1 <- df %>% group_by(id) %>% mutate( sumV1 = sum(v1_gm),meanV2 = mean(v2_pct),meanV3 = mean(v3_pct),gr_type = paste(type,collapse="/")) %>% filter(treat == "A") %>% select(d1,d2,id,type,v1_gm=sumV1, v2_pct = meanV2, v3_pct = meanV3,type = gr_type)

this will give you the answers and then use bind_rows you'll get your desired result

final_res <- bind_rows(df,res1)
ArunK
  • 1,731
  • 16
  • 35
  • Concatenate in str_c(type, sep="/") doesn't seem to be working, but if it did I could send the result to a time file, delete all the rows where treat!="A", rename the calculated columns to the original col names, and rbind tmp with the original file. – Dave Stumped Jun 01 '16 at 12:51
  • sorry, You can use paste command instead.. I've made the changes to the answer. you can then filter using the pipe command.. – ArunK Jun 01 '16 at 13:01
  • In reality, the actual data table I have has an additional 25 columns similar to after v3_pct, all of which needed to be treated the same way as v2_pct and v3_pct (i.e., mean by group). If the last column is v25, is there a way of generalizing your code to select v1_pct : v25 to avoid typing each one separately for the calculation? – Dave Stumped Jun 01 '16 at 16:43
0

With some modification of @Arun 's answer, the following script solves the problem completely.

library("dplyr")
res1 <- df %>%  
  group_by(id) %>%  
  mutate(  
    v1_gm = sum(v1_gm),  
    v2_pct = mean(v2_pct),  
    v3_pct = mean(v3_pct),  
    type = paste(type,collapse="/")) %>%  
  filter(treat == "A") %>%  
  mutate(treat = as.character("calculated"))  
final_res1 <- bind_rows(df,res1)  
final_res1$id <- as.character(final_res1$id)  
final_res1 <- final_res1 [order(final_res1$id, final_res1$treat, na.last=FALSE),  ] 
Dave Stumped
  • 319
  • 1
  • 3
  • 12