I am trying to figure out an optimized way to perform math operations based on row identifier.
Sample data set as follows:
A B C D E F G H I J K
0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0
1 1 2 1 1 2 1 2 1 1 2
1 1 2 1 1 2 1 2 1 1 2
1 1 2 1 1 2 1 2 1 1 2
1 1 2 1 1 2 1 2 1 1 2
2 1 2 1 1 2 1 2 1 1 2
2 1 2 1 1 2 1 2 1 1 2
2 1 2 1 1 2 1 2 1 1 2
2 1 2 1 1 2 1 2 1 1 2
3 1 2 1 1 2 1 2 1 1 2
3 1 2 1 1 2 1 2 1 1 2
3 1 2 1 1 2 1 2 1 1 2
3 1 2 1 1 2 1 2 1 1 2
4 1 2 1 1 2 1 2 1 1 2
4 1 2 1 1 2 1 2 1 1 2
4 1 2 1 1 2 1 2 1 1 2
4 1 2 1 1 2 1 2 1 1 2
I want to find sum of rows based on column A
. So, final output will have four rows:
A B C D E F G H I J K
0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0
1 4 8 4 8 4 8 4 8 4 8
2 4 8 4 8 4 8 4 8 4 8
3 4 8 4 8 4 8 4 8 4 8
4 4 8 4 8 4 8 4 8 4 8
Since, the real data set is large. I am not able to think clearly as to how I can traverse through all the data set and get the desired operation done. sum
above is just an example, I will do more complex operation. Key is to subset data based on row key and then perform operation, store and then keep doing it till last row key is reached.
Any suggestions will be helpful, thanks.