Questions tagged [pandas-groupby]

To be used for grouping variables together based on a given condition. And only to be used with relevance to `pandas` library

pandas.DataFrame.groupby allows you to group variables in a DataFrame or a certain number of columns in different categories.

After grouping, one can also obtain the mean and perform other operations as well.

8780 questions
13
votes
2 answers

Pandas dataframe to dict of dict

Given the following pandas data frame: ColA ColB ColC 0 a1 t 1 1 a2 t 2 2 a3 d 3 3 a4 d 4 I want to get a dictionary of dictionary. But I managed to create the following only: d = {t : [1, 2], d : [3,…
Homap
  • 2,142
  • 5
  • 24
  • 34
13
votes
1 answer

Weird behaviour with groupby on ordered categorical columns

MCVE df = pd.DataFrame({ 'Cat': ['SF', 'W', 'F', 'R64', 'SF', 'F'], 'ID': [1, 1, 1, 2, 2, 2] }) df.Cat = pd.Categorical( df.Cat, categories=['R64', 'SF', 'F', 'W'], ordered=True) As you can see, I've define an ordered categorical…
cs95
  • 379,657
  • 97
  • 704
  • 746
13
votes
4 answers

Groupby sum and count on multiple columns in python

I have a pandas dataframe that looks like this ID country month revenue profit ebit 234 USA 201409 10 5 3 344 USA 201409 9 7 2 532 UK 201410 20 10 5 129 Canada …
N91
  • 395
  • 1
  • 3
  • 14
13
votes
1 answer

pandas dataframe filter to return True for ALL rows. how?

Hi I have a filter 'm' set that is flexible enough to change by me. Sometimes, I want to filter by Car or x_acft_body , or any of the various other fields, etc. Sometime I want to have all of the rows returned by commenting and uncommenting the…
ihightower
  • 3,093
  • 6
  • 34
  • 49
12
votes
5 answers

Groupby Roll up or Roll Down for any kind of aggregates

TL;DR: How can we achieve something similar to Group By Roll Up with any kind of aggregates in pandas? (Credit to @Scott Boston for this term) I have following dataframe: P Q R S T 0 PLAC NR F HOL F 1 PLAC NR F NHOL F 2 …
ThePyGuy
  • 17,779
  • 5
  • 18
  • 45
12
votes
1 answer

Pandas: Sort a dataframe based on multiple columns

I know that this question has been asked several times. But none of the answers match my case. I've a pandas dataframe with columns,department and employee_count. I need to sort the employee_count column in descending order. But if there is a tie…
Impromptu_Coder
  • 425
  • 3
  • 7
  • 27
12
votes
5 answers

Python 3 pandas.groupby.filter

I am trying to perform a groupby filter that is very similar to the example in this documentation: pandas groupby filter >>> df = pd.DataFrame({'A' : ['foo', 'bar', 'foo', 'bar', ... 'foo', 'bar'], ... …
FinProg
  • 155
  • 1
  • 7
12
votes
3 answers

Aggregate unique values from multiple columns with pandas GroupBy

I went into countless threads (1 2 3...) and still I don't find a solution to my problem... I have a dataframe like this: prop1 prop2 prop3 prop4 L30 3 bob 11.2 L30 54 bob 10 L30 11 john 10 L30 10 bob …
Nithrynx
  • 123
  • 1
  • 1
  • 6
12
votes
7 answers

Faster alternative to perform pandas groupby operation

I have a dataset with name (person_name), day and color (shirt_color) as columns. Each person wears a shirt with a certain color on a particular day. The number of days can be arbitrary. E.g. input: name day color ---------------- John 1 …
astrobiologist
  • 183
  • 1
  • 1
  • 7
12
votes
1 answer

Pandas - Add Column Name to Results of groupby

I would like to add column names to the results of a groupby on a DataFrame in Python 3.6. I tried this code: import pandas as pd d = {'timeIndex': [1, 1, 1, 1, 2, 2, 2], 'isZero': [0,0,0,1,0,0,0]} df = pd.DataFrame(data=d) df2 =…
Jacob Quisenberry
  • 1,131
  • 3
  • 20
  • 48
12
votes
2 answers

Assign Unique Numeric Group IDs to Groups in Pandas

I've consistently run into this issue of having to assign a unique ID to each group in a data set. I've used this when zero padding for RNN's, generating graphs, and many other occasions. This can usually be done by concatenating the values in each…
seeiespi
  • 3,628
  • 2
  • 35
  • 37
12
votes
4 answers

How to get the first group in a groupby of multiple columns?

I've been trying to figure out how I can return just the first group, after I apply groupby. My code looks like this: gb = df.groupby(['col1', 'col2', 'col3', 'col4'])['col5'].sum() What I want is for that first first group to output. I've been…
Hana
  • 1,330
  • 4
  • 23
  • 38
12
votes
1 answer

Merge rows within a group together

I have a pandas DataFrame where some pairs of rows have the same ID but different name. What I want is to reduce the row pair to one row, and display both of their names. INPUT: ID NAME AGE 149 Bob 32 150 Tom 53 150 Roberts …
Landmaster
  • 1,043
  • 2
  • 13
  • 21
12
votes
2 answers

Keep columns after a groupby in an empty dataframe

The dataframe is an empty df after query.when groupby,raise runtime waring,then get another empty dataframe with no columns.How to keep the columns? df = pd.DataFrame(columns=["PlatformCategory","Platform","ResClassName","Amount"]) print…
user2890059
  • 145
  • 1
  • 6
12
votes
3 answers

Pandas select rows if ID appear several time

I have a table like this: CustID Purchase Time A Item1 01/01/2011 B Item2 01/01/2011 C Item1 01/02/2011 A Item2 03/01/2011 I would like to select rows with CustID appear more than 1 in the table.
Hai Vu
  • 197
  • 1
  • 9