Questions tagged [group-by]

GROUP BY is a command in the SQL relational database standard for collapsing a group of rows that share common field value(s) into a single row. Aggregate functions can be performed on other fields in the group, such as SUM() or AVG(), to collate related data into a single value.

About

The GROUP BY statement is used in conjunction with the aggregate functions to group the result-set by one or more columns.

Aggregate functions can be performed on other fields in the group, such as SUM() or AVG(), to collate related data into a single value.

Syntax

SELECT column_name, aggregate_function(column_name)
FROM table_name
WHERE column_name operator value
GROUP BY column_name

Source

MySQL Handling of GROUP BY

In standard SQL, a query that includes a GROUP BY clause cannot refer to nonaggregated columns in the select list that are not named in the GROUP BY clause. For example, this query is illegal in standard SQL because the name column in the select list does not appear in the GROUP BY:

SELECT o.custid, c.name, MAX(o.payment)
  FROM orders AS o, customers AS c
  WHERE o.custid = c.custid
  GROUP BY o.custid;

For the query to be legal, the name column must be omitted from the select list or named in the GROUP BY clause.

Source

GROUP BY (Aggregate) Functions

Related Tags :

27499 questions
292
votes
11 answers

How to use count and group by at the same select statement

I have an SQL SELECT query that also uses a GROUP BY, I want to count all the records after the GROUP BY clause filtered the resultset. Is there any way to do this directly with SQL? For example, if I have the table users and want to select the…
Stavros
  • 5,802
  • 13
  • 32
  • 45
290
votes
8 answers

Pandas DataFrame Groupby two columns and get counts

I have a pandas dataframe in the following format: df = pd.DataFrame([ [1.1, 1.1, 1.1, 2.6, 2.5, 3.4,2.6,2.6,3.4,3.4,2.6,1.1,1.1,3.3], list('AAABBBBABCBDDD'), [1.1, 1.7, 2.5, 2.6, 3.3, 3.8,4.0,4.2,4.3,4.5,4.6,4.7,4.7,4.8], …
Nilani Algiriyage
  • 32,876
  • 32
  • 87
  • 121
280
votes
7 answers

pandas GroupBy columns with NaN (missing) values

I have a DataFrame with many missing values in columns which I wish to groupby: import pandas as pd import numpy as np df = pd.DataFrame({'a': ['1', '2', '3'], 'b': ['4', np.NaN, '6']}) In [4]: df.groupby('b').groups Out[4]: {'4': [0], '6':…
Gyula Sámuel Karli
  • 3,118
  • 2
  • 15
  • 18
269
votes
5 answers

JOIN two SELECT statement results

Is it possible to join the results of 2 sql SELECT statements in one statement? I have a database of tasks where each record is a separate task, with deadlines (and a PALT, which is just an INT of days from start to deadline. Age is also an INT…
sylverfyre
  • 3,049
  • 3
  • 17
  • 15
264
votes
8 answers

Pandas dataframe get first row of each group

I have a pandas DataFrame like following: df = pd.DataFrame({'id' : [1,1,1,2,2,3,3,3,3,4,4,5,6,6,6,7,7], 'value' : ["first","second","second","first", "second","first","third","fourth", …
Nilani Algiriyage
  • 32,876
  • 32
  • 87
  • 121
258
votes
6 answers

Pandas get topmost n records within each group

Suppose I have pandas DataFrame like this: df = pd.DataFrame({'id':[1,1,1,2,2,2,2,3,4], 'value':[1,2,3,1,2,3,4,1,1]}) which looks like: id value 0 1 1 1 1 2 2 1 3 3 2 1 4 2 2 5 2 3 6 2 4 7 3 …
Roman Pekar
  • 107,110
  • 28
  • 195
  • 197
247
votes
4 answers

SQL query for finding records where count > 1

I have a table named PAYMENT. Within this table I have a user ID, an account number, a ZIP code and a date. I would like to find all records for all users that have more than one payment per day with the same account number. UPDATE: Additionally,…
Benjamin Muschko
  • 32,442
  • 9
  • 61
  • 82
241
votes
10 answers

Relative frequencies / proportions with dplyr

Suppose I want to calculate the proportion of different values within each group. For example, using the mtcars data, how do I calculate the relative frequency of number of gears by am (automatic/manual) in one go with…
jenswirf
  • 7,087
  • 11
  • 45
  • 65
231
votes
18 answers

How can I group time by hour or by 10 minutes?

Like when I do SELECT [Date] FROM [FRIIB].[dbo].[ArchiveAnalog] GROUP BY [Date] How can I specify the group period? I'm using MS SQL 2008. I've tried this, both with % 10 and / 10. SELECT MIN([Date]) AS RecT, AVG(Value) FROM…
cnd
  • 32,616
  • 62
  • 183
  • 313
214
votes
6 answers

How to access pandas groupby dataframe by key

How do I access the corresponding groupby dataframe in a groupby object by the key? With the following groupby: rand = np.random.RandomState(1) df = pd.DataFrame({'A': ['foo', 'bar'] * 3, 'B': rand.randn(6), …
beardc
  • 20,283
  • 17
  • 76
  • 94
213
votes
6 answers

GROUP BY with MAX(DATE)

I'm trying to list the latest destination (MAX departure time) for each train in a table, for example: Train Dest Time 1 HK 10:00 1 SH 12:00 1 SZ 14:00 2 HK 13:00 2 SH …
Aries
  • 2,191
  • 2
  • 15
  • 7
203
votes
6 answers

Naming returned columns in Pandas aggregate function?

I'm having trouble with Pandas' groupby functionality. I've read the documentation, but I can't see to figure out how to apply aggregate functions to multiple columns and have custom names for those columns. This comes very close, but the data…
David Chouinard
  • 6,466
  • 8
  • 43
  • 61
200
votes
14 answers

Count number of records returned by group by

How do I count the number of records returned by a group by query, For eg: select count(*) from temptable group by column_1, column_2, column_3, column_4 Gives me, 1 1 2 I need to count the above records to get 1+1+1 = 3.
Chris
  • 2,923
  • 6
  • 26
  • 25
199
votes
10 answers

SQL - using alias in Group By

Just curious about SQL syntax. So if I have SELECT itemName as ItemName, substring(itemName, 1,1) as FirstLetter, Count(itemName) FROM table1 GROUP BY itemName, FirstLetter This would be incorrect because GROUP BY itemName, FirstLetter…
Haoest
  • 13,610
  • 29
  • 89
  • 105
186
votes
13 answers

GroupBy pandas DataFrame and select most common value

I have a data frame with three string columns. I know that the only one value in the 3rd column is valid for every combination of the first two. To clean the data I have to group by data frame by first two columns and select most common value of the…
Viacheslav Nefedov
  • 2,259
  • 3
  • 15
  • 15