-1

I am trying to figure out an optimized way to perform math operations based on row identifier.

Sample data set as follows:

A   B   C   D   E   F   G   H   I   J   K
0   0   0   0   0   0   0   0   0   0   0
0   0   0   0   0   0   0   0   0   0   0
0   0   0   0   0   0   0   0   0   0   0
1   1   2   1   1   2   1   2   1   1   2
1   1   2   1   1   2   1   2   1   1   2
1   1   2   1   1   2   1   2   1   1   2
1   1   2   1   1   2   1   2   1   1   2
2   1   2   1   1   2   1   2   1   1   2
2   1   2   1   1   2   1   2   1   1   2
2   1   2   1   1   2   1   2   1   1   2
2   1   2   1   1   2   1   2   1   1   2
3   1   2   1   1   2   1   2   1   1   2
3   1   2   1   1   2   1   2   1   1   2
3   1   2   1   1   2   1   2   1   1   2
3   1   2   1   1   2   1   2   1   1   2
4   1   2   1   1   2   1   2   1   1   2
4   1   2   1   1   2   1   2   1   1   2
4   1   2   1   1   2   1   2   1   1   2
4   1   2   1   1   2   1   2   1   1   2

I want to find sum of rows based on column A. So, final output will have four rows:

A   B   C   D   E   F   G   H   I   J   K
0   0   0   0   0   0   0   0   0   0   0
0   0   0   0   0   0   0   0   0   0   0
0   0   0   0   0   0   0   0   0   0   0
1   4   8   4   8   4   8   4   8   4   8
2   4   8   4   8   4   8   4   8   4   8
3   4   8   4   8   4   8   4   8   4   8
4   4   8   4   8   4   8   4   8   4   8

Since, the real data set is large. I am not able to think clearly as to how I can traverse through all the data set and get the desired operation done. sum above is just an example, I will do more complex operation. Key is to subset data based on row key and then perform operation, store and then keep doing it till last row key is reached.

Any suggestions will be helpful, thanks.

www
  • 38,575
  • 12
  • 48
  • 84
Chetan Arvind Patil
  • 854
  • 1
  • 11
  • 31

1 Answers1

2

A solution from dplyr. If you want to have a more complex operation, such as apply different functions to different columns. summarise_at or summarise_if could be more suitable than summarise_all. But it is not possible to develop a specific solution until we see a good reproducible example and problem statement of that situation.

library(dplyr)

dt2 <- dt %>%
  group_by(A) %>%
  summarise_all(funs(sum(.)))

dt2
# A tibble: 5 x 11
      A     B     C     D     E     F     G     H     I     J     K
  <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int>
1     0     0     0     0     0     0     0     0     0     0     0
2     1     4     8     4     4     8     4     8     4     4     8
3     2     4     8     4     4     8     4     8     4     4     8
4     3     4     8     4     4     8     4     8     4     4     8
5     4     4     8     4     4     8     4     8     4     4     8

Update

Here is a base R method from @Wen

dt2 <- aggregate(.~ A, data = dt, FUN = sum)

And a solution from data.table

library(data.table)

dt2 <- setDT(dt)[, lapply(.SD, sum), by = A]

DATA

dt <- read.table(text = "A   B   C   D   E   F   G   H   I   J   K
0   0   0   0   0   0   0   0   0   0   0
                 0   0   0   0   0   0   0   0   0   0   0
                 0   0   0   0   0   0   0   0   0   0   0
                 1   1   2   1   1   2   1   2   1   1   2
                 1   1   2   1   1   2   1   2   1   1   2
                 1   1   2   1   1   2   1   2   1   1   2
                 1   1   2   1   1   2   1   2   1   1   2
                 2   1   2   1   1   2   1   2   1   1   2
                 2   1   2   1   1   2   1   2   1   1   2
                 2   1   2   1   1   2   1   2   1   1   2
                 2   1   2   1   1   2   1   2   1   1   2
                 3   1   2   1   1   2   1   2   1   1   2
                 3   1   2   1   1   2   1   2   1   1   2
                 3   1   2   1   1   2   1   2   1   1   2
                 3   1   2   1   1   2   1   2   1   1   2
                 4   1   2   1   1   2   1   2   1   1   2
                 4   1   2   1   1   2   1   2   1   1   2
                 4   1   2   1   1   2   1   2   1   1   2
                 4   1   2   1   1   2   1   2   1   1   2",
                 header = TRUE)
www
  • 38,575
  • 12
  • 48
  • 84
  • @Wen Me neither. – www Sep 17 '17 at 02:42
  • @Wen - The data I have has non-zero values. I just wrote `0` for cleaner reproducible data here and those values are just used for complex math I wrote in the question above. – Chetan Arvind Patil Sep 17 '17 at 02:47
  • @ChetanArvindPatil Then I think , by using ycw's answer could solve your problem ~ – BENY Sep 17 '17 at 02:48
  • @ycw - Thanks. What if can't use use `dplyr` due to system issues? Possible to do with base packages? I am on enterprise version of Spotfire (`TERR`) based tool, which has issues with `dplyr`, why I don't know. – Chetan Arvind Patil Sep 17 '17 at 02:49
  • 2
    @ChetanArvindPatil then you can try `aggdata <- aggregate(.~ A, data = dt, FUN = sum)` – BENY Sep 17 '17 at 02:54
  • @Wen Thanks for providing the base R method. With your permission, I would like to add this as an update to my post. – www Sep 17 '17 at 02:56