Questions tagged [pandas-groupby]

To be used for grouping variables together based on a given condition. And only to be used with relevance to `pandas` library

pandas.DataFrame.groupby allows you to group variables in a DataFrame or a certain number of columns in different categories.

After grouping, one can also obtain the mean and perform other operations as well.

8780 questions
21
votes
2 answers

pandas get minimum of one column in group when groupby another

I have a pandas dataframe that looks like this: c y 0 9 0 1 8 0 2 3 1 3 6 2 4 1 3 5 2 3 6 5 3 7 4 4 8 0 4 9 7 4 I'd like to groupby y and get the min and max…
MetaStack
  • 3,266
  • 4
  • 30
  • 67
21
votes
4 answers

How to keep original index of a DataFrame after groupby 2 columns?

Is there any way I can retain the original index of my large dataframe after I perform a groupby? The reason I need to this is because I need to do an inner merge back to my original df (after my groupby) to regain those lost columns. And the index…
Hana
  • 1,330
  • 4
  • 23
  • 38
21
votes
1 answer

Add column for percentage of total to Pandas dataframe

I have a dataframe that I am doing a groupby() on to get the counts on a column's values. I am trying to add an additional column for "Percentage of Total". I'm not sure how to accomplish that. I've looked at a few groupby options, but can't seem…
AlliDeacon
  • 1,365
  • 3
  • 21
  • 35
21
votes
2 answers

Pandas Groupy take only the first N Groups

I have some DataFrame which I want to group by the ID, e. g.: import pandas as pd df = pd.DataFrame({'item_id': ['a', 'a', 'b', 'b', 'b', 'c', 'd'], 'user_id': [1,2,1,1,3,1,5]}) print df Which generates: item_id user_id 0 a 1 1 …
Christian Sauer
  • 10,351
  • 10
  • 53
  • 85
20
votes
4 answers

Pandas groupby multiple columns, list of multiple columns

I have the following data: Invoice NoStockCode Description Quantity CustomerID Country 536365 85123A WHITE HANGING HEART T-LIGHT HOLDER 6 17850 United Kingdom 536365 71053 WHITE METAL LANTERN…
GrandmasLove
  • 465
  • 1
  • 4
  • 14
20
votes
1 answer

Get unique values of multiple columns as a new dataframe in pandas

Having pandas data frame df with at least columns C1,C2,C3 how would you get all the unique C1,C2,C3 values as a new DataFrame? in other words, similiar to : SELECT C1,C2,C3 FROM T GROUP BY C1,C2,C3 Tried that print…
Ofek Ron
  • 8,354
  • 13
  • 55
  • 103
20
votes
3 answers

Python Pandas: Assign Last Value of DataFrame Group to All Entries of That Group

In Python Pandas, I have a DataFrame. I group this DataFrame by a column and want to assign the last value of a column to all rows of another column. I know that I am able to select the last row of the group by this command: import pandas as pd df…
user7450524
20
votes
4 answers

Sampling one record per unique value (pandas, python)

I work with python-pandas dataframes, and I have a large dataframe containing users and their data. Each user can have multiple rows. I want to sample 1-row per user. My current solution seems not efficient: df1 = pd.DataFrame({'User': ['user1',…
Ruslan
  • 911
  • 2
  • 11
  • 28
19
votes
1 answer

Why is groupby so fast?

This is a follow up question to this one, where jezrael used pandas.DataFrame.groupby to increment by a factor of some hundreds the speed of a list creation. Specifically, let df be a large dataframe, then index = list(set(df.index)) list_df =…
19
votes
3 answers

Pandas groupby multiple columns, with pct_change

I'm trying to find the period-over-period growth in Value for each unique group, grouped by (Company, Group, and Date). Company Group Date Value A X 2015-01 1 A X 2015-02 2 A X 2015-03 1.5 A XX 2015-01 …
user3357979
  • 607
  • 1
  • 5
  • 12
19
votes
5 answers

How to bin time in a pandas dataframe

I am trying to analyze average daily fluctuations in a measurement "X" over several weeks using pandas dataframes, however timestamps/datetimes etc. are proving particularly hellish to deal with. Having spent a good few hours trying to work this out…
Josh
  • 321
  • 1
  • 2
  • 6
19
votes
1 answer

pandas group by year, rank by sales column, in a dataframe with duplicate data

I would like to create a rank on year (so in year 2012, Manager B is 1. In 2011, Manager B is 1 again). I struggled with the pandas rank function for awhile and DO NOT want to resort to a for loop. s =…
Ben
  • 381
  • 1
  • 3
  • 15
18
votes
5 answers

Pandas transform inconsistent behavior for list

I have sample snippet that works as expected: import pandas as pd df = pd.DataFrame(data={'label': ['a', 'b', 'b', 'c'], 'wave': [1, 2, 3, 4], 'y': [0,0,0,0]}) df['new'] = df.groupby(['label'])[['wave']].transform(tuple) The result is: label …
Quant Christo
  • 1,275
  • 9
  • 23
18
votes
2 answers

How to use pandas Grouper on multiple keys?

I need to groupby-transform a dataframe by a datetime column AND another str(object) column to apply a function by group and asign the result to each of the row members of the group. I understand the groupby workflow but cannot make a pandas.Grouper…
pablete
  • 1,030
  • 1
  • 12
  • 21
18
votes
1 answer

Difference between "as_index = False", and "reset_index()" in pandas groupby

I just wanted to know what is the difference in the function performed by these 2. Data: import pandas as pd df = pd.DataFrame({"ID":["A","B","A","C","A","A","C","B"], "value":[1,2,4,3,6,7,3,4]}) as_index=False : df_group1 =…
Rohith
  • 1,008
  • 3
  • 8
  • 19