Questions tagged [aggregate]

Aggregate refers to the process of summarizing grouped data, commonly used in Statistics.

Aggregate refers to the process of summarizing grouped data, commonly used in Statistics. Typically this involves replacing groups of data with single values (e.g. sum, mean, standard deviation, etc.). In SQL databases and data manipulation libraries such as in , this is accomplished with the use of GROUP BY and aggregate functions.

Documentation:

8256 questions
67
votes
10 answers

Select the top N values by group

This is in response to a question asked on the r-help mailing list. Here are lots of examples of how to find top values by group using sql, so I imagine it's easy to convert that knowledge over using the R sqldf package. An example: when mtcars is…
Anthony Damico
  • 5,779
  • 7
  • 46
  • 77
66
votes
4 answers

Aggregating by unique identifier and concatenating related values into a string

I have a need that I imagine could be satisfied by aggregate or reshape, but I can't quite figure out. I have a list of names (brand), and accompanying ID number (id). This data is in long form, so names can have multiple ID's. I'd like to…
roody
  • 2,633
  • 5
  • 38
  • 50
64
votes
6 answers

Rename result columns from Pandas aggregation ("FutureWarning: using a dict with renaming is deprecated")

I'm trying to do some aggregations on a pandas data frame. Here is a sample code: import pandas as pd df = pd.DataFrame({"User": ["user1", "user2", "user2", "user3", "user2", "user1"], "Amount": [10.0, 5.0, 8.0, 10.5, 7.5,…
Victor Mayrink
  • 1,064
  • 1
  • 13
  • 24
60
votes
3 answers

How to include BIT type column in SELECT part with out including it on the GROUP BY in T-SQL?

Here is my T-SQL query SELECT ProductID, VendorID, ProductName= MAX(ProductName), VendorName = MAX(VendorName), IsActive = MAX(IsActive) # This brings error FROM ProductVendorAssoc GROUP BY ProductID, VendorID I…
Mithun Sreedharan
  • 49,883
  • 70
  • 181
  • 236
59
votes
6 answers

Select the first and last row by group in a data frame

How can I select the first and last row for each unique id in the following dataframe? tmp <- structure(list(id = c(15L, 15L, 15L, 15L, 21L, 21L, 22L, 22L, 22L, 23L, 23L, 23L, 24L, 24L, 24L, 24L), d = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,…
Francesco
  • 593
  • 1
  • 4
  • 5
58
votes
11 answers

Solution for SpecificationError: nested renamer is not supported while agg() along with groupby()

def stack_plot(data, xtick, col2='project_is_approved', col3='total'): ind = np.arange(data.shape[0]) plt.figure(figsize=(20,5)) p1 = plt.bar(ind, data[col3].values) p2 = plt.bar(ind, data[col2].values) plt.ylabel('Projects') …
Akshay Jindal
  • 607
  • 1
  • 5
  • 4
58
votes
10 answers

Why is there no PRODUCT aggregate function in SQL?

Im looking for something like SELECT PRODUCT(table.price) FROM table GROUP BY table.sale similar to how SUM works. Have I missed something on the documentation, or is there really no PRODUCT function? If so, why not? Note: I looked for the function…
lock
  • 6,404
  • 18
  • 58
  • 76
56
votes
8 answers

Aggregate a dataframe on a given column and display another column

I have a dataframe in R of the following form: > head(data) Group Score Info 1 1 1 a 2 1 2 b 3 1 3 c 4 2 4 d 5 2 3 e 6 2 1 f I would like to aggregate it following the Score column…
jul635
  • 794
  • 1
  • 7
  • 13
56
votes
9 answers

Explain the aggregate functionality in Spark (with Python and Scala)

I am looking for some better explanation of the aggregate functionality that is available via spark in python. The example I have is as follows (using pyspark from Spark 1.2.0 version) sc.parallelize([1,2,3,4]).aggregate( (0, 0), (lambda acc,…
ab_tech_sp
  • 943
  • 2
  • 9
  • 7
55
votes
2 answers

Aggregate multiple columns at once

I have a data-frame likeso: x <- id1 id2 val1 val2 val3 val4 1 a x 1 9 2 a x 2 4 3 a y 3 5 4 a y 4 9 5 b x 1 7 6 b y 4 4 7 b x 3 9 8 b y 2 8 I wish to aggregate the…
Rookie
  • 5,179
  • 13
  • 41
  • 65
54
votes
4 answers

Why can't indexed views have a MAX() aggregate?

I have been trying out a few index views and am impressed but I nearly always need a max or a min as well and can not understand why it doesn't work with these, can anyone explain why? I KNOW they are not allowed, I just can't understand why!!!…
alex
  • 725
  • 1
  • 6
  • 11
54
votes
2 answers

aggregate methods treat missing values (NA) differently

Here's a simple data frame with a missing value: M = data.frame( Name = c('name', 'name'), Col1 = c(NA, 1) , Col2 = c(1, 1)) # Name Col1 Col2 # 1 name NA 1 # 2 name 1 1 When I use aggregate to sum variables by group ('Name') using the…
Ryan Walker
  • 3,176
  • 1
  • 23
  • 29
50
votes
4 answers

LINQ: How do I concatenate a list of integers into comma delimited string?

It's probably something silly I missed, but I try to concatenate a list of integers instead of summing them with: integerArray.Aggregate((accumulator, piece) => accumulator+"," + piece) The compiler complained about argument error. Is there a slick…
Haoest
  • 13,610
  • 29
  • 89
  • 105
48
votes
5 answers

Aggregate Function on Uniqueidentifier (GUID)

Let's say I have the following table: category | guid ---------+----------------------- A | 5BC2... A | 6A1C... B | 92A2... Basically, I want to do the following SQL: SELECT category, MIN(guid) FROM myTable GROUP BY…
Heinzi
  • 167,459
  • 57
  • 363
  • 519
48
votes
4 answers

Pandas groupby(),agg() - how to return results without the multi index?

I have a dataframe: pe_odds[ [ 'EVENT_ID', 'SELECTION_ID', 'ODDS' ] ] Out[67]: EVENT_ID SELECTION_ID ODDS 0 100429300 5297529 18.00 1 100429300 5297529 20.00 2 100429300 5297529 21.00 3 100429300 5297529 …
Ginger
  • 8,320
  • 12
  • 56
  • 99