Questions tagged [aggregate]

Aggregate refers to the process of summarizing grouped data, commonly used in Statistics.

Aggregate refers to the process of summarizing grouped data, commonly used in Statistics. Typically this involves replacing groups of data with single values (e.g. sum, mean, standard deviation, etc.). In SQL databases and data manipulation libraries such as pandas in python, this is accomplished with the use of GROUP BY and aggregate functions.

Documentation:

8256 questions

votes

4 answers

Aggregate and Weighted Mean in R

I'm trying to calculate asset-weighted returns by asset class. For the life of me, I can't figure out how to do it using the aggregate command. My data frame looks like this dat <- data.frame(company, fundname, assetclass, return, assets) I'm…

r aggregate

asked Jul 29 '10 at 21:43

Brandon Bertelsen

43,807
34
160
255

votes

2 answers

Pandas aggregation ignoring NaN's

I aggregate my Pandas dataframe: data. Specifically, I want to get the average and sum amounts by tuples of [origin and type]. For averaging and summing I tried the numpy functions below: import numpy as np import pandas as pd result =…

python numpy pandas aggregate nan

asked Oct 01 '14 at 16:01

Zhubarb

11,432
18
75
114

votes

3 answers

Django aggregate Count only True values

I'm using aggregate to get the count of a column of booleans. I want the number of True values. DJANGO CODE: count = Model.objects.filter(id=pk).aggregate(bool_col=Count('my_bool_col') This returns the count of all rows. SQL QUERY SHOULD BE: SELECT…

python django aggregate

asked Aug 11 '14 at 19:00

Pietro

1,815
2
29
63

votes

5 answers

Multiple functions in a single tapply or aggregate statement

Is it possible to include two functions within a single tapply or aggregate statement? Below I use two tapply statements and two aggregate statements: one for mean and one for SD. I would prefer to combine the statements. my.Data = read.table(text =…

r aggregate tapply

asked Mar 05 '13 at 03:02

Mark Miller

12,483
23
78
132

votes

1 answer

python pandas: diff between 2 dates in a groupby

Using Python 3.6 and Pandas 0.19.2: I have a DataFrame containing parsed log files for transactions. Each line is timestamped, contains a transactionid, and can either represent the beginning or the end of a transaction (so each transactionid has 1…

python pandas group-by aggregate

asked Apr 25 '17 at 12:54

Guillaume

5,497
3
24
42

votes

2 answers

Multiple return values (structured bindings) with unmovable types and guaranteed RVO in C++17

With C++ 17, we will have the possibility to return unmovable (including uncopyable) types such as std::mutex, via what can be thought of as guaranteed return value optimization (RVO): Guaranteed copy elision through simplified value categories:…

c++ aggregate c++17 rvo

asked Jul 14 '16 at 22:51

Johan Lundberg

26,184
12
71
97

votes

3 answers

How to sum in pandas by unique index in several columns?

I have a pandas DataFrame which details online activities in terms of "clicks" during an user session. There are as many as 50,000 unique users, and the dataframe has around 1.5 million samples. Obviously most users have multiple records. The four…

python pandas sum aggregate

asked Feb 10 '16 at 05:44

ShanZhengYang

16,511
49
132
234

votes

2 answers

Performance of max() vs ORDER BY DESC + LIMIT 1

I was troubleshooting a few slow SQL queries today and don't quite understand the performance difference below: When trying to extract the max(timestamp) from a data table based on some condition, using MAX() is slower than ORDER BY timestamp LIMIT…

sql postgresql max aggregate sql-limit

asked Dec 12 '15 at 23:54

Geotob

2,847
1
16
26

votes

6 answers

SQL group by day, show orders for each day

I have an SQL 2005 table, let's call it Orders, in the format: OrderID, OrderDate, OrderAmount 1, 25/11/2008, 10 2, 25/11/2008, 2 3, 30/1002008, 5 Then I need to produce a report table showing the ordered amount on each day in…

sql sql-server-2005 aggregate

asked Nov 30 '08 at 22:06

Radu094

28,068
16
63
80

votes

3 answers

Pandas: apply different functions to different columns

When using df.mean() I get a result where the mean for each column is given. Now let's say I want the mean of the first column, and the sum of the second. Is there a way to do this? I don't want to have to disassemble and reassemble the…

python pandas aggregate

asked Oct 17 '14 at 22:12

pbreach

16,049
27
82
120

votes

5 answers

Aggregating sub totals and grand totals with data.table

I've got a data.table in R: library(data.table) set.seed(1) DT = data.table( group=sample(letters[1:2],100,replace=TRUE), year=sample(2010:2012,100,replace=TRUE), v=runif(100)) Aggregating this data into a summary table by group and year is…

r aggregate plyr data.table

asked Feb 16 '12 at 16:41

Zach

29,791
35
142
201

votes

2 answers

DDD, Entity Framework, Aggregate Entity Behavior ( Person.AddEmail, etc)

Here's a simple example of a problem I'm running across that is not meshing with some of the ideas presented here and other places regarding DDD. Say I have an ASP.NET MVC 3 site that creates/manipulates a person. The controllers access an…

entity-framework domain-driven-design entity aggregate anemic-domain-model

asked Jun 15 '11 at 18:48

user800131

votes

3 answers

How to filter rows for a specific aggregate with spark sql?

Normally all rows in a group are passed to an aggregate function. I would like to filter rows using a condition so that only some rows within a group are passed to an aggregate function. Such operation is possible with PostgreSQL. I would like to do…

sql apache-spark apache-spark-sql aggregate

asked Sep 26 '16 at 22:25

Marcin Król

1,555
2
16
31

votes

4 answers

Earliest Date for each id in R

I have a dataset where each individual (id) has an e_date, and since each individual could have more than one e_date, I'm trying to get the earliest date for each individual. So basically I would like to have a dataset with one row per each id…

r date aggregate

asked Aug 11 '16 at 10:24

pietro

votes

2 answers

Summarize data.table by group

I am working with a huge data table in R containing monthly measurements of temperature for multiple locations, taken by different sources. The dataset looks like this: library(data.table) # Generate random data: loc <- 1:10 dates <-…

r data.table aggregate mean

asked Apr 10 '16 at 05:22

thiagoveloso

2,537
3
28
57

Prev 1 2 3

…

99 100 Next