Questions tagged [aggregate]

Aggregate refers to the process of summarizing grouped data, commonly used in Statistics.

Aggregate refers to the process of summarizing grouped data, commonly used in Statistics. Typically this involves replacing groups of data with single values (e.g. sum, mean, standard deviation, etc.). In SQL databases and data manipulation libraries such as pandas in python, this is accomplished with the use of GROUP BY and aggregate functions.

Documentation:

8256 questions

143

votes

5 answers

Pandas aggregate count distinct

Let's say I have a log of user activity and I want to generate a report of the total duration and the number of unique users per day. import numpy as np import pandas as pd df = pd.DataFrame({'date':…

python pandas group-by aggregate

asked Sep 01 '13 at 03:25

dave

1,613
2
11
9

118

votes

8 answers

Extract row corresponding to minimum value of a variable by group

I wish to (1) group data by one variable (State), (2) within each group find the row of minimum value of another variable (Employees), and (3) extract the entire row. (1) and (2) are easy one-liners, and I feel like (3) should be too, but I can't…

r dplyr data.table aggregate

asked Jun 05 '14 at 21:45

Ed S

1,293
2
9
6

110

votes

4 answers

Pandas sum by groupby, but exclude certain columns

What is the best way to do a groupby on a Pandas dataframe, but exclude some columns from that groupby? e.g. I have the following dataframe: Code Country Item_Code Item Ele_Code Unit Y1961 Y1962 Y1963 2 Afghanistan 15 …

python pandas group-by aggregate

asked Sep 23 '15 at 23:45

user308827

21,227
87
254
417

109

votes

16 answers

Pass percentiles to pandas agg function

I want to pass the numpy percentile() function through pandas' agg() function as I do below with various other numpy statistics functions. Right now I have a dataframe that looks like this: AGGREGATE MY_COLUMN A 10 A 12 B …

python pandas numpy aggregate

asked Jul 10 '13 at 18:22

slizb

5,742
4
25
22

108

votes

6 answers

Collapse / concatenate / aggregate a column to a single comma separated string within each group

I want to aggregate one column in a data frame according to two grouping variables, and separate the individual values by a comma. Here is some data: data <- data.frame(A = c(rep(111, 3), rep(222, 3)), B = rep(1:2, 3), C = c(5:10)) data # A B …

r aggregate r-faq

asked Apr 10 '13 at 18:47

linp

1,427
2
12
13

107

votes

7 answers

Apply several summary functions (sum, mean, etc.) on several variables by group in one call

I have the following data frame x <- read.table(text = " id1 id2 val1 val2 1 a x 1 9 2 a x 2 4 3 a y 3 5 4 a y 4 9 5 b x 1 7 6 b y 4 4 7 b x 3 9 8 b y 2 8", header =…

r aggregate r-faq

asked Aug 21 '12 at 22:55

broccoli

4,738
10
42
54

106

votes

4 answers

ListAGG in SQLSERVER

I'm trying to aggregate a 'STRING' field in SQLServer. I would like to find the same function LISTAGG like in Oracle . Do you know how to do the same function or an another method? For Example, Field A | Field B 1 | A 1 | B 2 | …

sql sql-server aggregate

asked Mar 18 '13 at 13:14

user1557642

1,083
2
10
6

100

votes

5 answers

Keep other columns when doing groupby

I'm using groupby on a pandas dataframe to drop all rows that don't have the minimum of a specific column. Something like this: df1 = df.groupby("item", as_index=False)["diff"].min() However, if I have more than those two columns, the other…

python pandas aggregate pandas-groupby

asked Apr 30 '14 at 17:29

PointXIV

1,258
2
15
23

votes

3 answers

Extract the maximum value within each group in a dataframe

I have a data frame with a grouping variable ("Gene") and a value variable ("Value"): Gene Value A 12 A 10 B 3 B 5 B 6 C 1 D 3 D 4 For each level of my grouping variable, I wish to extract the maximum…

r aggregate

asked Aug 14 '14 at 17:45

Johnathan

1,877
4
23
29

votes

3 answers

Add count of unique / distinct values by group to the original data

I wish to count the number of unique values by grouping of a second variable, and then add the count to the existing data.frame as a new column. For example, if the existing data frame looks like this: color type 1 black chair 2 black chair 3…

r count unique aggregate

asked Jul 02 '13 at 09:18

Bryan

1,771
4
17
30

votes

7 answers

How does a site like kayak.com aggregate content?

Greetings, I've been toying with an idea for a new project and was wondering if anyone has any idea on how a service like Kayak.com is able to aggregate data from so many sources so quickly and accurately. More specifically, do you think Kayak.com…

api architecture screen-scraping aggregate

asked Jan 05 '11 at 17:27

Jeff

2,818
3
29
31

votes

6 answers

Python Pandas: Is Order Preserved When Using groupby() and agg()?

I've frequented used pandas' agg() function to run summary statistics on every column of a data.frame. For example, here's how you would produce the mean and standard deviation: df = pd.DataFrame({'A': ['group1', 'group1', 'group2', 'group2',…

python pandas aggregate

asked Oct 19 '14 at 22:31

BringMyCakeBack

1,499
2
12
16

votes

4 answers

Name columns within aggregate in R

I know I can *re*name columns after I aggregate the data: blubb <- aggregate(dat$two ~ dat$one, ...) colnames(blubb) <- c("One", "Two") Nothing wrong with that. But is there a way to aggregate and name the columns in one go? Sort of like: blubb <-…

r rename aggregate

asked Mar 09 '13 at 09:31

user1322720

votes

11 answers

Count number of rows per group and add result to original data frame

Say I have a data.frame object: df <- data.frame(name=c('black','black','black','red','red'), type=c('chair','chair','sofa','sofa','plate'), num=c(4,5,12,4,3)) Now I want to count the number of rows (observations)…

r count aggregate r-faq

asked Sep 16 '11 at 21:33

Uri Laserson

2,391
5
30
39

votes

4 answers

data.frame Group By column

I have a data frame DF. Say DF is: A B 1 1 2 2 1 3 3 2 3 4 3 5 5 3 6 Now I want to combine together the rows by the column A and to have the sum of the column B. For example: A B 1 1 5 2 2 3 3 3 11 I am doing this currently using an SQL query…

r aggregate

asked Sep 14 '13 at 08:36

nikosdi

2,138
5
26
35

Prev 1

…

99 100 Next