Questions tagged [pandas-groupby]

To be used for grouping variables together based on a given condition. And only to be used with relevance to `pandas` library

pandas.DataFrame.groupby allows you to group variables in a DataFrame or a certain number of columns in different categories.

After grouping, one can also obtain the mean and perform other operations as well.

8780 questions

votes

5 answers

What's the equivalent of Panda's value_counts() in PySpark?

I am having the following python/pandas command: df.groupby('Column_Name').agg(lambda x: x.value_counts().max() where I am getting the value counts for ALL columns in a DataFrameGroupBy object. How do I do this action in PySpark?

asked Jun 27 '18 at 13:08

TSAR

votes

2 answers

How to do group by on a multiindex in pandas?

Below is my dataframe. I made some transformations to create the category column and dropped the original column it was derived from. Now I need to do a group-by to remove the dups e.g. Love and Fashion can be rolled up via a groupby…

python pandas dataframe pandas-groupby multi-index

asked Nov 05 '13 at 20:24

Tampa

75,446
119
278
425

votes

6 answers

Pandas groupby with categories with redundant nan

I am having issues using pandas groupby with categorical data. Theoretically, it should be super efficient: you are grouping and indexing via integers rather than strings. But it insists that, when grouping by multiple categories, every combination…

python pandas numpy group-by pandas-groupby

asked Jan 27 '18 at 01:12

jpp

159,742
34
281
339

votes

3 answers

Pandas, groupby and count

I have a dataframe say like this >>> df = pd.DataFrame({'user_id':['a','a','s','s','s'], 'session':[4,5,4,5,5], 'revenue':[-1,0,1,2,1]}) >>> df revenue session user_id 0 -1 4 a 1 …

python pandas pandas-groupby

asked Nov 16 '17 at 02:29

GoingMyWay

16,802
32
96
149

votes

3 answers

get first and last values in a groupby

I have a dataframe df df = pd.DataFrame(np.arange(20).reshape(10, -1), [['a', 'a', 'a', 'a', 'b', 'b', 'b', 'c', 'c', 'd'], ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']], ['X', 'Y']) How…

python pandas dataframe group-by pandas-groupby

asked Aug 05 '16 at 20:23

Brian

1,555
3
16
23

votes

1 answer

Transform vs. aggregate in Pandas

When grouping a Pandas DataFrame, when should I use transform and when should I use aggregate? How do they differ with respect to their application in practice and which one do you consider more important?

python pandas pandas-groupby aggregation

asked Dec 04 '16 at 11:05

Sylvi0202

votes

3 answers

Pandas GroupBy.apply method duplicates first group

My first SO question: I am confused about this behavior of apply method of groupby in pandas (0.12.0-4), it appears to apply the function TWICE to the first row of a data frame. For example: >>> from pandas import Series, DataFrame >>> import pandas…

python pandas group-by pandas-groupby

asked Jan 27 '14 at 19:37

NC maize breeding Jim

votes

6 answers

Python Pandas: Calculate moving average within group

I have a dataframe containing time series for 100 objects: object period value 1 1 24 1 2 67 ... 1 1000 56 2 1 59 2 2 46 ... 2 1000 64 3 1 54 ... 100 1 …

python pandas pandas-groupby moving-average

asked Nov 16 '18 at 13:40

Alexandr Kapshuk

1,380
2
13
29

votes

5 answers

Groupby class and count missing values in features

I have a problem and I cannot find any solution in the web or documentation, even if I think that it is very trivial. What do I want to do? I have a dataframe like this CLASS FEATURE1 FEATURE2 FEATURE3 X A NaN NaN X NaN …

python pandas dataframe group-by pandas-groupby

asked Dec 27 '18 at 15:15

codlix

votes

3 answers

pandas groupby dropping columns

I'm doing a simple group by operation, trying to compare group means. As you can see below, I have selected specific columns from a larger dataframe, from which all missing values have been removed. But when I group by, I am losing a couple of…

python pandas dataframe pandas-groupby

asked Jun 01 '16 at 18:11

user3334415

votes

3 answers

Python Pandas Conditional Sum with Groupby

Using sample data: df = pd.DataFrame({'key1' : ['a','a','b','b','a'], 'key2' : ['one', 'two', 'one', 'two', 'one'], 'data1' : np.random.randn(5), 'data2' : np. random.randn(5)}) df data1 data2…

python pandas pandas-groupby

asked Jun 23 '13 at 23:06

AllenQ

1,659
2
16
18

votes

6 answers

How can I group by month from a date field using Python and Pandas?

I have a dataframe, df, which is as follows: | date | Revenue | |-----------|---------| | 6/2/2017 | 100 | | 5/23/2017 | 200 | | 5/20/2017 | 300 | | 6/22/2017 | 400 | | 6/21/2017 | 500 | I need to group the above data by…

python pandas pandas-groupby

asked Jul 04 '17 at 14:19

Symphony

1,655
4
15
22

votes

2 answers

Including the group name in the apply function pandas python

Is there away to specify to the groupby() call to use the group name in the apply() lambda function? Similar to if I iterate through groups I can get the group key via the following tuple decomposition: for group_name, subdf in…

python pandas pandas-groupby apply

asked Sep 08 '15 at 14:36

user1129988

1,516
4
19
32

votes

2 answers

pandas: GroupBy .pipe() vs .apply()

In the example from the pandas documentation about the new .pipe() method for GroupBy objects, an .apply() method accepting the same lambda would return the same results. In [195]: import numpy as np In [196]: n = 1000 In [197]: df =…

python python-3.x pandas pandas-groupby

asked Nov 10 '17 at 15:41

foglerit

7,792
8
44
64

votes

3 answers

Combine duplicated columns within a DataFrame

If I have a dataframe that has columns that include the same name, is there a way to combine the columns that have the same name with some sort of function (i.e. sum)? For instance with: In [186]: df["NY-WEB01"].head() Out[186]: …

python pandas dataframe group-by pandas-groupby

asked Oct 25 '12 at 23:19

Kyle Brandt

26,938
37
124
165

Prev 1 2

…

99 100 Next