Sum up specific rows for each column

Question

I'm sure my problem is easy to solve unfortunately I can't come up with a simple solution for my self. I want to sum up certain rows of a dataset for each column.

My dataset looks like this:

    GIVN  MICP  GFIP
-2  0.01  0.02  0.01
-1  0.03 -0.01  0.01
0  -0.02 -0.03  0.01
1  -0.04  0.05 -0.02
2   0.01  0.02  0.03

Now I want it to be summed once from row -1 to 1 and from row -2 to 1 for each column.

This should look like this for -1 to 1:

    GIVN  MICP  GFIP
   -0.03  0.01  0.00

With the function colSums I only add all rows from each column, which is not what I want to do.

You might want to think about using `colSums` with a subset of your data (e.g. data[2:4,] for the second, third and fourth rows of your data) — p0bs, Dec 17 '18 at 14:20
`colSums(df[which(rownames(df) == -1):which(rownames(df) == 1),])` — Sotos, Dec 17 '18 at 14:23

score 0 · Answer 1 · answered Dec 17 '18 at 22:07

This is an option with dplyr, but it's a little clunky. The trickiness comes from the fact that you're not cleanly cutting the ID into mutually-exclusive groups, so you need to operate with essentially separate data frames, then bind them back together.

First you need your rownames to be numeric, so you can compare number ranges.

You'll filter the data for the groups of IDs; dplyr::between is a utility function for finding whether a number is in a range, inclusive of the range's endpoints. I'm adding a variable with mutate to specify which group data comes from; if you don't need that spelled out, you can drop the mutates and just add a .id argument in bind_rows. You just will need some way of differentiating the groups for when you summarize.

This goes inside a bind_rows call, which is like rbind but can take more than 2 data frames at once. Then group_by and summarize. If you have too many columns and naming them in summarise_at becomes cumbersome, you could instead drop the ID and use summarise_all or summarise_if.

library(dplyr)
df$id <- as.numeric(row.names(df))

bind_rows(
  df %>% filter(between(id, -1, 1)) %>% mutate(group = "-1 to 1"),
  df %>% filter(between(id, -2, 1)) %>% mutate(group = "-2 to 1")
) %>%
  group_by(group) %>%
  summarise_at(vars(GIVN:GFIP), sum)
#> # A tibble: 2 x 4
#>   group    GIVN  MICP  GFIP
#>   <chr>   <dbl> <dbl> <dbl>
#> 1 -1 to 1 -0.03  0.01  0   
#> 2 -2 to 1 -0.02  0.03  0.01

^{Created on 2018-12-17 by the reprex package (v0.2.1)}

Sum up specific rows for each column

1 Answers1