Questions tagged [aggregate]

Aggregate refers to the process of summarizing grouped data, commonly used in Statistics.

Aggregate refers to the process of summarizing grouped data, commonly used in Statistics. Typically this involves replacing groups of data with single values (e.g. sum, mean, standard deviation, etc.). In SQL databases and data manipulation libraries such as in , this is accomplished with the use of GROUP BY and aggregate functions.

Documentation:

8256 questions
143
votes
5 answers

Pandas aggregate count distinct

Let's say I have a log of user activity and I want to generate a report of the total duration and the number of unique users per day. import numpy as np import pandas as pd df = pd.DataFrame({'date':…
dave
  • 1,613
  • 2
  • 11
  • 9
118
votes
8 answers

Extract row corresponding to minimum value of a variable by group

I wish to (1) group data by one variable (State), (2) within each group find the row of minimum value of another variable (Employees), and (3) extract the entire row. (1) and (2) are easy one-liners, and I feel like (3) should be too, but I can't…
Ed S
  • 1,293
  • 2
  • 9
  • 6
110
votes
4 answers

Pandas sum by groupby, but exclude certain columns

What is the best way to do a groupby on a Pandas dataframe, but exclude some columns from that groupby? e.g. I have the following dataframe: Code Country Item_Code Item Ele_Code Unit Y1961 Y1962 Y1963 2 Afghanistan 15 …
user308827
  • 21,227
  • 87
  • 254
  • 417
109
votes
16 answers

Pass percentiles to pandas agg function

I want to pass the numpy percentile() function through pandas' agg() function as I do below with various other numpy statistics functions. Right now I have a dataframe that looks like this: AGGREGATE MY_COLUMN A 10 A 12 B …
slizb
  • 5,742
  • 4
  • 25
  • 22
108
votes
6 answers

Collapse / concatenate / aggregate a column to a single comma separated string within each group

I want to aggregate one column in a data frame according to two grouping variables, and separate the individual values by a comma. Here is some data: data <- data.frame(A = c(rep(111, 3), rep(222, 3)), B = rep(1:2, 3), C = c(5:10)) data # A B …
linp
  • 1,427
  • 2
  • 12
  • 13
107
votes
7 answers

Apply several summary functions (sum, mean, etc.) on several variables by group in one call

I have the following data frame x <- read.table(text = " id1 id2 val1 val2 1 a x 1 9 2 a x 2 4 3 a y 3 5 4 a y 4 9 5 b x 1 7 6 b y 4 4 7 b x 3 9 8 b y 2 8", header =…
broccoli
  • 4,738
  • 10
  • 42
  • 54
106
votes
4 answers

ListAGG in SQLSERVER

I'm trying to aggregate a 'STRING' field in SQLServer. I would like to find the same function LISTAGG like in Oracle . Do you know how to do the same function or an another method? For Example, Field A | Field B 1 | A 1 | B 2 | …
user1557642
  • 1,083
  • 2
  • 10
  • 6
100
votes
5 answers

Keep other columns when doing groupby

I'm using groupby on a pandas dataframe to drop all rows that don't have the minimum of a specific column. Something like this: df1 = df.groupby("item", as_index=False)["diff"].min() However, if I have more than those two columns, the other…
PointXIV
  • 1,258
  • 2
  • 15
  • 23
86
votes
3 answers

Extract the maximum value within each group in a dataframe

I have a data frame with a grouping variable ("Gene") and a value variable ("Value"): Gene Value A 12 A 10 B 3 B 5 B 6 C 1 D 3 D 4 For each level of my grouping variable, I wish to extract the maximum…
Johnathan
  • 1,877
  • 4
  • 23
  • 29
86
votes
3 answers

Add count of unique / distinct values by group to the original data

I wish to count the number of unique values by grouping of a second variable, and then add the count to the existing data.frame as a new column. For example, if the existing data frame looks like this: color type 1 black chair 2 black chair 3…
Bryan
  • 1,771
  • 4
  • 17
  • 30
84
votes
7 answers

How does a site like kayak.com aggregate content?

Greetings, I've been toying with an idea for a new project and was wondering if anyone has any idea on how a service like Kayak.com is able to aggregate data from so many sources so quickly and accurately. More specifically, do you think Kayak.com…
Jeff
  • 2,818
  • 3
  • 29
  • 31
77
votes
6 answers

Python Pandas: Is Order Preserved When Using groupby() and agg()?

I've frequented used pandas' agg() function to run summary statistics on every column of a data.frame. For example, here's how you would produce the mean and standard deviation: df = pd.DataFrame({'A': ['group1', 'group1', 'group2', 'group2',…
BringMyCakeBack
  • 1,499
  • 2
  • 12
  • 16
71
votes
4 answers

Name columns within aggregate in R

I know I can *re*name columns after I aggregate the data: blubb <- aggregate(dat$two ~ dat$one, ...) colnames(blubb) <- c("One", "Two") Nothing wrong with that. But is there a way to aggregate and name the columns in one go? Sort of like: blubb <-…
user1322720
67
votes
11 answers

Count number of rows per group and add result to original data frame

Say I have a data.frame object: df <- data.frame(name=c('black','black','black','red','red'), type=c('chair','chair','sofa','sofa','plate'), num=c(4,5,12,4,3)) Now I want to count the number of rows (observations)…
Uri Laserson
  • 2,391
  • 5
  • 30
  • 39
67
votes
4 answers

data.frame Group By column

I have a data frame DF. Say DF is: A B 1 1 2 2 1 3 3 2 3 4 3 5 5 3 6 Now I want to combine together the rows by the column A and to have the sum of the column B. For example: A B 1 1 5 2 2 3 3 3 11 I am doing this currently using an SQL query…
nikosdi
  • 2,138
  • 5
  • 26
  • 35