Questions tagged [split-apply-combine]

Split-apply-combine operations refer to a common type data manipulation where a function/statistic is computed on several chunks of data independently. The chunks are defined by the value of one variable.

Splitting data by the value of one or more variables
Applying a function to each chunk of data independently
Combining the data back into one piece

Examples of split-apply-combine operations would be:

Computing median income by country from individual-level data (possibly appending the result to the same data)
Generating highest score for each class from student scores

Tools for streamlining split-apply-combine operations are available for popular statistical computation environments (non-exhaustive list):

In the R statistical environment there are dedicated packages for this purpose
- data.table is an extension of data.frame that is optimized for split-apply-combine operations among other things
- dplyr and the original package plyr provide convenient syntax and fast processing for such manipulations
In Python, the pandas library introduces data objects that include a group-by method for this type of operation.

151 questions

votes

2 answers

How to summarise a table by 2 columns in R

I would like to summarise this data set by grouping 1st by period, and 2nd by Payer id so that results are shown as subtotal for any given user by month as follows: data.frame: Payer Period 1 10 1-1015 2 15 2-1015 3 14 3-1015 1 1 …

r aggregate split-apply-combine

asked Jul 12 '15 at 18:59

Chelo F

votes

1 answer

Complicated subtraction in R

I am working on a data-set that requires me to subtract information from columns. It is a repeated measure data-set where one person is tested up to a max of six times and a minimum of two times. The data are in long-format Here's a sample…

r aggregate cumulative-sum split-apply-combine

asked Jul 08 '15 at 19:03

Sid0311

votes

2 answers

Adding aggregated counts as extra dataframe rows

I have a data frame with the letters of the English alphabet and their frequency. Now it would be nice to also know the frequency of the vowels and the consonants and the total number of occurrences - and since I want to plot all of this…

r dataframe aggregate rbind split-apply-combine

asked Jul 04 '15 at 18:19

not_a_number

votes

2 answers

Sum certain values from changing dataframe in R

I have a data frame that I would like to aggregate by adding certain values. Say I have six clusters. I then feed data from each cluster into some function that generates a value x which is then put into the output data frame. cluster year …

r group-by sum dataframe split-apply-combine

asked Jun 25 '15 at 21:02

adaml768

votes

1 answer

pandas - Perform computation against a reference record within groups

For each row of data in a DataFrame I would like to compute the number of unique values in columns A and B for that particular row and a reference row within the group identified by another column ID. Here is a toy dataset: d = {'ID' :…

python-2.7 pandas split-apply-combine

asked Feb 19 '15 at 01:08

sriramn

2,338
4
35
45

votes

2 answers

Collapse a character vector by value in another column r

I have a dataframe with a set of character strings in one column, and a grouping variable (a string, but could be a factor) in another. I'd like to collapse the dataframe such that the strings are collapsed into elements by grouping-variable. For…

r plyr tapply split-apply-combine

asked Jan 23 '15 at 13:43

sjgknight

votes

2 answers

R loop over levels of a factor to create a sequence of numbers for each level

I'm working on a dataframe with GPS data from beavers, the dataframe includes on column with the animals id (see $id below) which is a factor with 26 levels. For each beaver, we have several GPS values - the number differs from animal to animal. I…

r gps split-apply-combine

asked Oct 16 '14 at 13:52

Pat

votes

1 answer

Group androgynous names and sum amount for each year in a data frame in R

I have a data frame with 4 columns titled 'year' 'name' 'sex' 'amount'. Here is a sample data set set.seed(1) data = data.frame(year=sample(1950:2000, 50, replace=TRUE),name=sample(LETTERS, 50, replace=TRUE), …

r dataframe split-apply-combine

asked Oct 13 '14 at 19:20

beck8

votes

1 answer

Java ArrayList adding current item to Previous item; remove current item

Purpose of the code is to iterate thru each item in ArrayList> listOfLists and combine previous list to current list, sort the current list and remove the next list (since already combined). This needs to happen until there is only one list left.…

java arraylist merge split-apply-combine

asked Jul 17 '14 at 05:05

shivster

votes

2 answers

Efficient conditional summing by multiple conditions in R

I'm struggling with finding an efficient solution for the following problem: I have a large manipulated data frame with around 8 columns and 80000 rows that generally includes multiple data types. I want to create a new data frame that includes the…

r dataframe aggregate multiple-conditions split-apply-combine

asked Mar 10 '14 at 02:21

Joe K.

-1

votes

1 answer

r split-apply-combine problems

I'm new to r and have a large data.frame (906 rows), and I want to (row?) split the data.frame by the first column (entries associated with the same name are together) before I apply multiple descriptive statistics (mean, standard deviation,…

r na split-apply-combine

asked Jan 28 '21 at 19:36

Paige

-1

votes

1 answer

Combining rows by index in R

EDIT: I am aware there is a similar question that has been answered, but it does not work for me on the dataset I have provided below. The above dataframe is the result of me using the spread function. I am still not sure how to consolidate…

r tidyverse split-apply-combine

asked Apr 27 '18 at 04:47

melbez

-1

votes

1 answer

A column that's omitted during split-apply-combie in pandas

I'm doing a split-apply-combine to find a total quantity for each member. The dataframe I need should have 14 columns: MemberID, DSFS_0_1, DSFS_1_2, DSFS_2_3, DSFS_3_4, DSFS_4_5, DSFS_5_6, DSFS_6_7, DSFS_7_8, DSFS_8_9, DSFS_9_10, DSFS_10_11,…

python pandas dataframe aggregate-functions split-apply-combine

asked Apr 11 '16 at 22:59

squidvision

-1

votes

2 answers

Remove NAs from each variable (column) and combine cases

I have a dataset that I am cleaning up and have certain rows (observations) which I would like to combine. The best way to explain what I am trying to do is with the following…

r dplyr split-apply-combine

asked Oct 22 '15 at 21:24

rjss

-3

votes

1 answer

Calculate mean and add in new row in R but to reflect in all the entries of a particular column

I have the dataset like below,and I read it as a csv file and load the dataframe as df Name Value1 Value1 A 2 5 A 1 5 B 3 4 B 1 4 C 0 3 C 5 …

r mean split-apply-combine

asked Jan 03 '17 at 08:03

Joe

Prev 1 2 3

…

11 Next