Questions tagged [aggregate]

Aggregate refers to the process of summarizing grouped data, commonly used in Statistics.

Aggregate refers to the process of summarizing grouped data, commonly used in Statistics. Typically this involves replacing groups of data with single values (e.g. sum, mean, standard deviation, etc.). In SQL databases and data manipulation libraries such as in , this is accomplished with the use of GROUP BY and aggregate functions.

Documentation:

8256 questions
16
votes
4 answers

Aggregate and Weighted Mean in R

I'm trying to calculate asset-weighted returns by asset class. For the life of me, I can't figure out how to do it using the aggregate command. My data frame looks like this dat <- data.frame(company, fundname, assetclass, return, assets) I'm…
Brandon Bertelsen
  • 43,807
  • 34
  • 160
  • 255
16
votes
2 answers

Pandas aggregation ignoring NaN's

I aggregate my Pandas dataframe: data. Specifically, I want to get the average and sum amounts by tuples of [origin and type]. For averaging and summing I tried the numpy functions below: import numpy as np import pandas as pd result =…
Zhubarb
  • 11,432
  • 18
  • 75
  • 114
16
votes
3 answers

Django aggregate Count only True values

I'm using aggregate to get the count of a column of booleans. I want the number of True values. DJANGO CODE: count = Model.objects.filter(id=pk).aggregate(bool_col=Count('my_bool_col') This returns the count of all rows. SQL QUERY SHOULD BE: SELECT…
Pietro
  • 1,815
  • 2
  • 29
  • 63
16
votes
5 answers

Multiple functions in a single tapply or aggregate statement

Is it possible to include two functions within a single tapply or aggregate statement? Below I use two tapply statements and two aggregate statements: one for mean and one for SD. I would prefer to combine the statements. my.Data = read.table(text =…
Mark Miller
  • 12,483
  • 23
  • 78
  • 132
15
votes
1 answer

python pandas: diff between 2 dates in a groupby

Using Python 3.6 and Pandas 0.19.2: I have a DataFrame containing parsed log files for transactions. Each line is timestamped, contains a transactionid, and can either represent the beginning or the end of a transaction (so each transactionid has 1…
Guillaume
  • 5,497
  • 3
  • 24
  • 42
15
votes
2 answers

Multiple return values (structured bindings) with unmovable types and guaranteed RVO in C++17

With C++ 17, we will have the possibility to return unmovable (including uncopyable) types such as std::mutex, via what can be thought of as guaranteed return value optimization (RVO): Guaranteed copy elision through simplified value categories:…
Johan Lundberg
  • 26,184
  • 12
  • 71
  • 97
15
votes
3 answers

How to sum in pandas by unique index in several columns?

I have a pandas DataFrame which details online activities in terms of "clicks" during an user session. There are as many as 50,000 unique users, and the dataframe has around 1.5 million samples. Obviously most users have multiple records. The four…
ShanZhengYang
  • 16,511
  • 49
  • 132
  • 234
15
votes
2 answers

Performance of max() vs ORDER BY DESC + LIMIT 1

I was troubleshooting a few slow SQL queries today and don't quite understand the performance difference below: When trying to extract the max(timestamp) from a data table based on some condition, using MAX() is slower than ORDER BY timestamp LIMIT…
Geotob
  • 2,847
  • 1
  • 16
  • 26
15
votes
6 answers

SQL group by day, show orders for each day

I have an SQL 2005 table, let's call it Orders, in the format: OrderID, OrderDate, OrderAmount 1, 25/11/2008, 10 2, 25/11/2008, 2 3, 30/1002008, 5 Then I need to produce a report table showing the ordered amount on each day in…
Radu094
  • 28,068
  • 16
  • 63
  • 80
15
votes
3 answers

Pandas: apply different functions to different columns

When using df.mean() I get a result where the mean for each column is given. Now let's say I want the mean of the first column, and the sum of the second. Is there a way to do this? I don't want to have to disassemble and reassemble the…
pbreach
  • 16,049
  • 27
  • 82
  • 120
14
votes
5 answers

Aggregating sub totals and grand totals with data.table

I've got a data.table in R: library(data.table) set.seed(1) DT = data.table( group=sample(letters[1:2],100,replace=TRUE), year=sample(2010:2012,100,replace=TRUE), v=runif(100)) Aggregating this data into a summary table by group and year is…
Zach
  • 29,791
  • 35
  • 142
  • 201
14
votes
2 answers

DDD, Entity Framework, Aggregate Entity Behavior ( Person.AddEmail, etc)

Here's a simple example of a problem I'm running across that is not meshing with some of the ideas presented here and other places regarding DDD. Say I have an ASP.NET MVC 3 site that creates/manipulates a person. The controllers access an…
14
votes
3 answers

How to filter rows for a specific aggregate with spark sql?

Normally all rows in a group are passed to an aggregate function. I would like to filter rows using a condition so that only some rows within a group are passed to an aggregate function. Such operation is possible with PostgreSQL. I would like to do…
Marcin Król
  • 1,555
  • 2
  • 16
  • 31
14
votes
4 answers

Earliest Date for each id in R

I have a dataset where each individual (id) has an e_date, and since each individual could have more than one e_date, I'm trying to get the earliest date for each individual. So basically I would like to have a dataset with one row per each id…
pietro
  • 143
  • 1
  • 1
  • 5
14
votes
2 answers

Summarize data.table by group

I am working with a huge data table in R containing monthly measurements of temperature for multiple locations, taken by different sources. The dataset looks like this: library(data.table) # Generate random data: loc <- 1:10 dates <-…
thiagoveloso
  • 2,537
  • 3
  • 28
  • 57