Questions tagged [split-apply-combine]

Split-apply-combine operations refer to a common type data manipulation where a function/statistic is computed on several chunks of data independently. The chunks are defined by the value of one variable.

Splitting data by the value of one or more variables
Applying a function to each chunk of data independently
Combining the data back into one piece

Examples of split-apply-combine operations would be:

Computing median income by country from individual-level data (possibly appending the result to the same data)
Generating highest score for each class from student scores

Tools for streamlining split-apply-combine operations are available for popular statistical computation environments (non-exhaustive list):

In the R statistical environment there are dedicated packages for this purpose
- data.table is an extension of data.frame that is optimized for split-apply-combine operations among other things
- dplyr and the original package plyr provide convenient syntax and fast processing for such manipulations
In Python, the pandas library introduces data objects that include a group-by method for this type of operation.

151 questions

vote

1 answer

Pandas: How to combine several rows with the same column value and create a new Dataframe which covers all possibilities?

There exists a DataFrame like this: id name age 0x0 Hans 32 0x0 Peter 21 0x1 Jan 42 0x1 Simon 25 0x1 Klaus 51 0x1 Franz 72 I'm aiming to create a DataFrame that covers any possible combination within the same ID. The only…

asked Jan 06 '21 at 19:27

Royal.Flush

vote

1 answer

Compute and broadcast a count in pandas (with groupby transform)

How can I compute and broadcast a count in pandas? To compute a count: df.groupby('field').size() To broadcast an aggregation to the original dataframe: df.groupby('field')['field_to_aggregate'].transform(aggregation) The latter works if I specify…

python pandas aggregation split-apply-combine

asked Nov 24 '20 at 17:15

Michele Piccolini

2,634
16
29

vote

1 answer

pandas groupby shift is not respecting the groups

I have the following DataFrame and an arbitrary function df = pd.DataFrame( {'grp': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3], 'val': [0.80485036, 0.30698609, 0.33518013, 0.12214516, 0.66355629, 0.71277808,…

python pandas dataframe split-apply-combine

asked Aug 04 '20 at 19:03

Jonathan

1,287
14
17

vote

1 answer

Filtering pandas groupby using value of column (string datatype)

I've been working on a large genomics data set that contains multiple reads of every sample to make sure we got the data, but when analyzing it we need to drop it down to one row so we don't skew the data (count the gene as present 6 times when it…

python pandas split-apply-combine

asked Jan 29 '20 at 23:42

Andrew T

vote

2 answers

Swift Combine - prefix publisher on array

I'm playing around with publishers in Swift/Combine, I have a function that fetches 100 records and returns them as an array. As a test I want to return just the first two items, but it's not working as I expected it to, it always returns 100, my…

swift combine split-apply-combine

asked Jan 23 '20 at 13:16

Chris

2,739
4
29
57

vote

1 answer

Matlab: Use Splitapply to write multiple files

I have grouped tables by a variable and I am trying to write multiple files based on the grouping variable. But it does not work. I used findgroups and splitapply, but the splitapply is where I am having problems. Here is one version of the commands…

matlab split-apply-combine

asked Nov 19 '19 at 18:26

Hobbycoder

vote

1 answer

Un-nest output of d3.group or d3.rollup?

I am using d3-array rollup to do group-by like counting operation, in preparation to generate an html table. I have a variable number of grouping keys, which I am passing like this: var rollup_keys = new Map([ ['count', v => v.length], …

javascript d3.js split-apply-combine

asked Jul 05 '19 at 17:46

deargle

vote

0 answers

Matlab best practice for choosing and using splitapply, rowfun, and varfun?

Matlab seems to have a number of different code patterns for realizing SQL's GOUPBY aggregation of data. To me it seems that this makes it hard for best practice and code idioms to coalesce. Are there guidelines for which are best for which…

matlab group-by split-apply-combine

asked Jul 01 '19 at 15:11

user36800

2,019
2
19
34

vote

1 answer

Combine 3 columns to one column pandas

I have the following code: input= pd.DataFrame({'Police District Name': ['WHEATON', 'SILVER SPRING', 'BETHESDA','GERMANTOWN','WHEATON','MONTGOMERY VILLAGE'], 'cn1': ['Crime Against Person', 'Crime Against Person', 'Crime Against…

python pandas split-apply-combine

asked Nov 26 '18 at 18:02

mango90001

vote

3 answers

Create a variable whose values are with data type array and those values came from multiple columns

I would like to know how I could come up with the new variable "test_array" which is of data type array and created by combining columns "test_1" to "test_4" because I wanted to use it for further calculations. id test_1 test_2 test_3 …

arrays r split-apply-combine

asked Sep 11 '18 at 06:01

Ashtasora

vote

2 answers

How to add totals as well as group_by statistics in R

When computing any statistic using summarise and group_by we only get the summary statistic per-category, and not the value for all the population (Total). How to get both? I am looking for something clean and short. Until now I can only think…

r dplyr split-apply-combine

asked Aug 10 '18 at 20:14

Fernando Hoces De La Guardia

vote

0 answers

Matlab `splitapply` speed trend?

My organization is usually a few years behind the most recent Matlab version. I am finding that splitapply is extremely slow when there are many groups (two numerical grouping variables), in sharp contrast to my experience with SQL. I suspect that…

matlab split-apply-combine

asked May 14 '18 at 18:21

user36800

2,019
2
19
34

vote

3 answers

Alternative to splitapply in Matlab

I am trying to run someone else's Matlab code that uses the splitapply function, which is only available in R2018a. I am currently using R2015a; is there a simple (albeit less efficient) alternative implementation which achieves the same purpose…

matlab split-apply-combine

asked Apr 02 '18 at 05:54

p-value

vote

3 answers

Time Lag based on another variable

Given: test <- data.frame(Participant= c(1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,3,3,3,3,3,3,3,3,3,3), Day = c(0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9), Value= c(1:30)) I want to arrive…

r group-by split-apply-combine

asked Dec 02 '17 at 02:13

D500

vote

2 answers

Python Pandas Aggregate Series Data Within a DataFrame

Within a dataframe, I am trying split-apply-combine to a column which contains series data element-wise. (I've searched SO but haven't found anything pertaining to series within data frames.) The data frame: import pandas as pd from pandas import…

python pandas split-apply-combine

asked Sep 06 '17 at 20:26

Mark Pedigo

Prev 1 2 3

…

10 11 Next