Questions tagged [split-apply-combine]

Split-apply-combine operations refer to a common type data manipulation where a function/statistic is computed on several chunks of data independently. The chunks are defined by the value of one variable.

Splitting data by the value of one or more variables
Applying a function to each chunk of data independently
Combining the data back into one piece

Examples of split-apply-combine operations would be:

Computing median income by country from individual-level data (possibly appending the result to the same data)
Generating highest score for each class from student scores

Tools for streamlining split-apply-combine operations are available for popular statistical computation environments (non-exhaustive list):

In the R statistical environment there are dedicated packages for this purpose
- data.table is an extension of data.frame that is optimized for split-apply-combine operations among other things
- dplyr and the original package plyr provide convenient syntax and fast processing for such manipulations
In Python, the pandas library introduces data objects that include a group-by method for this type of operation.

151 questions

-3

votes

2 answers

deleting observations in pooled time series

I have a vertically arranged (stacked) pooled time series data.frame that looks like this: date item qty_sold day_1 orange 0 day_2 orange 0 day_3 orange 0 day_4 orange 0 day_5 orange 5 day_6 orange 0 day_7 orange 8 day_8 …

r dataframe sas split-apply-combine

asked Jul 06 '13 at 22:28

user27636

1,070
1
18
26

Prev 1 2 3

…