Questions tagged [pandas-groupby]

To be used for grouping variables together based on a given condition. And only to be used with relevance to `pandas` library

pandas.DataFrame.groupby allows you to group variables in a DataFrame or a certain number of columns in different categories.

After grouping, one can also obtain the mean and perform other operations as well.

8780 questions
49
votes
5 answers

What's the equivalent of Panda's value_counts() in PySpark?

I am having the following python/pandas command: df.groupby('Column_Name').agg(lambda x: x.value_counts().max() where I am getting the value counts for ALL columns in a DataFrameGroupBy object. How do I do this action in PySpark?
TSAR
  • 683
  • 1
  • 6
  • 8
48
votes
2 answers

How to do group by on a multiindex in pandas?

Below is my dataframe. I made some transformations to create the category column and dropped the original column it was derived from. Now I need to do a group-by to remove the dups e.g. Love and Fashion can be rolled up via a groupby…
Tampa
  • 75,446
  • 119
  • 278
  • 425
44
votes
6 answers

Pandas groupby with categories with redundant nan

I am having issues using pandas groupby with categorical data. Theoretically, it should be super efficient: you are grouping and indexing via integers rather than strings. But it insists that, when grouping by multiple categories, every combination…
jpp
  • 159,742
  • 34
  • 281
  • 339
44
votes
3 answers

Pandas, groupby and count

I have a dataframe say like this >>> df = pd.DataFrame({'user_id':['a','a','s','s','s'], 'session':[4,5,4,5,5], 'revenue':[-1,0,1,2,1]}) >>> df revenue session user_id 0 -1 4 a 1 …
GoingMyWay
  • 16,802
  • 32
  • 96
  • 149
44
votes
3 answers

get first and last values in a groupby

I have a dataframe df df = pd.DataFrame(np.arange(20).reshape(10, -1), [['a', 'a', 'a', 'a', 'b', 'b', 'b', 'c', 'c', 'd'], ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']], ['X', 'Y']) How…
Brian
  • 1,555
  • 3
  • 16
  • 23
43
votes
1 answer

Transform vs. aggregate in Pandas

When grouping a Pandas DataFrame, when should I use transform and when should I use aggregate? How do they differ with respect to their application in practice and which one do you consider more important?
Sylvi0202
  • 901
  • 2
  • 9
  • 13
42
votes
3 answers

Pandas GroupBy.apply method duplicates first group

My first SO question: I am confused about this behavior of apply method of groupby in pandas (0.12.0-4), it appears to apply the function TWICE to the first row of a data frame. For example: >>> from pandas import Series, DataFrame >>> import pandas…
41
votes
6 answers

Python Pandas: Calculate moving average within group

I have a dataframe containing time series for 100 objects: object period value 1 1 24 1 2 67 ... 1 1000 56 2 1 59 2 2 46 ... 2 1000 64 3 1 54 ... 100 1 …
Alexandr Kapshuk
  • 1,380
  • 2
  • 13
  • 29
40
votes
5 answers

Groupby class and count missing values in features

I have a problem and I cannot find any solution in the web or documentation, even if I think that it is very trivial. What do I want to do? I have a dataframe like this CLASS FEATURE1 FEATURE2 FEATURE3 X A NaN NaN X NaN …
codlix
  • 858
  • 1
  • 8
  • 24
40
votes
3 answers

pandas groupby dropping columns

I'm doing a simple group by operation, trying to compare group means. As you can see below, I have selected specific columns from a larger dataframe, from which all missing values have been removed. But when I group by, I am losing a couple of…
user3334415
  • 473
  • 1
  • 6
  • 7
40
votes
3 answers

Python Pandas Conditional Sum with Groupby

Using sample data: df = pd.DataFrame({'key1' : ['a','a','b','b','a'], 'key2' : ['one', 'two', 'one', 'two', 'one'], 'data1' : np.random.randn(5), 'data2' : np. random.randn(5)}) df data1 data2…
AllenQ
  • 1,659
  • 2
  • 16
  • 18
39
votes
6 answers

How can I group by month from a date field using Python and Pandas?

I have a dataframe, df, which is as follows: | date | Revenue | |-----------|---------| | 6/2/2017 | 100 | | 5/23/2017 | 200 | | 5/20/2017 | 300 | | 6/22/2017 | 400 | | 6/21/2017 | 500 | I need to group the above data by…
Symphony
  • 1,655
  • 4
  • 15
  • 22
39
votes
2 answers

Including the group name in the apply function pandas python

Is there away to specify to the groupby() call to use the group name in the apply() lambda function? Similar to if I iterate through groups I can get the group key via the following tuple decomposition: for group_name, subdf in…
user1129988
  • 1,516
  • 4
  • 19
  • 32
38
votes
2 answers

pandas: GroupBy .pipe() vs .apply()

In the example from the pandas documentation about the new .pipe() method for GroupBy objects, an .apply() method accepting the same lambda would return the same results. In [195]: import numpy as np In [196]: n = 1000 In [197]: df =…
foglerit
  • 7,792
  • 8
  • 44
  • 64
35
votes
3 answers

Combine duplicated columns within a DataFrame

If I have a dataframe that has columns that include the same name, is there a way to combine the columns that have the same name with some sort of function (i.e. sum)? For instance with: In [186]: df["NY-WEB01"].head() Out[186]: …
Kyle Brandt
  • 26,938
  • 37
  • 124
  • 165