Questions tagged [pandas-groupby]

To be used for grouping variables together based on a given condition. And only to be used with relevance to `pandas` library

pandas.DataFrame.groupby allows you to group variables in a DataFrame or a certain number of columns in different categories.

After grouping, one can also obtain the mean and perform other operations as well.

8780 questions
15
votes
3 answers

Pandas - Groupby with conditional formula

Survived SibSp Parch 0 0 1 0 1 1 1 0 2 1 0 0 3 1 1 0 4 0 0 1 Given the above dataframe, is there an elegant way to groupby with a condition? I want to…
14
votes
1 answer

How to use groupby and apply with polars

I am breaking my head trying to figure out how to use groupby and apply in Python's library polars. Coming from Pandas, I was using: def get_score(df): return spearmanr(df["prediction"], df["target"]).correlation correlations =…
jbssm
  • 6,861
  • 13
  • 54
  • 81
14
votes
4 answers

Pandas Split DataFrame using row index

I want to split dataframe by uneven number of rows using row index. The below code: groups = df.groupby((np.arange(len(df.index))/l[1]).astype(int)) works only for uniform number of rows. df a b c 1 1 1 2 2 2 3 3 3 4 4 4 5 5 5 6 6 6 …
Pradeep Tummala
  • 308
  • 1
  • 5
  • 15
14
votes
3 answers

Marking the entire group if condition is true for a single row

I have a dataframe which has Dates and public holidays Date WeekNum Public_Holiday 1/1/2015 1 1 2/1/2015 1 0 3/1/2015 1 0 4/1/2015 1 0 5/1/2015 1 0 6/1/2015 1 0 7/1/2015 1 0 8/1/2015 2 0 9/1/2015 2 …
Ahamed Moosa
  • 1,395
  • 7
  • 16
  • 30
14
votes
1 answer

Python Dataframe: Calculating R^2 and RMSE Using Groupby on One Column

I have the following Python dataframe: Type Actual Predicted A 4 3 A 10 18 A 13 11 B 3 10 B 4 2 B 8 33 C 20 17 C 40 33 C 87 80 C 32 …
PineNuts0
  • 4,740
  • 21
  • 67
  • 112
14
votes
1 answer

Pandas group by weekday (M/T/W/T/F/S/S)

I have a pandas dataframe containing a time series (as index) of the form YYYY-MM-DD ('arrival_date') and I'd like to group by each of the weekdays (Monday to Sunday) in order to calculate for the other columns the mean, median, std etc. I should…
mannaroth
  • 1,473
  • 3
  • 17
  • 38
14
votes
2 answers

Using isnull() and groupby() on a pandas dataframe

Suppose I have a dataframe df with columns 'A', 'B', 'C'. I would like to count the number of null values in column 'B' as grouped by 'A' and make a dictionary out of it: Tried the following by…
user8071804
14
votes
3 answers

How to groupby based on two columns in pandas?

A similar question might have been asked before, but I couldn't find the exact one fitting to my problem. I want to group by a dataframe based on two columns. For exmaple to make this id product quantity 1 A 2 1 A 3 1 B 2 2 A…
ARASH
  • 418
  • 2
  • 6
  • 18
14
votes
3 answers

Pandas Dataframe: how to add column with number of occurrences in other column

I have to following df: Col1 Col2 test Something test2 Something test3 Something test Something test2 Something test5 Something I want to get Col1 Col2 Occur test Something 2 test2 Something 2 test3 …
Laser
  • 6,652
  • 8
  • 54
  • 85
14
votes
4 answers

Replacing values with groupby means

I have a DataFrame with a column that has some bad data with various negative values. I would like to replace values < 0 with the mean of the group that they are in. For missing values as NAs, I would do: data =…
Def_Os
  • 5,301
  • 5
  • 34
  • 63
13
votes
3 answers

error reindex from a duplicate axis in groupby

by = "B" block has duplicated indices both in case1 and case2, why case1 work but case2 does not. case1 df1 = pd.DataFrame({"a":[0,100,200], "by":["A","B","B"]}, index=[0,1,1]) df1.groupby("by").diff() # result is okay case2 df2 =…
junliang
  • 131
  • 3
13
votes
1 answer

pandas: get all groupby values in an array

I'm sure this has been asked before, sorry if duplicate. Suppose I have the following dataframe: df = pd.DataFrame({'key': ['A', 'B', 'C', 'A', 'B', 'C'], 'data': range(6)}, columns=['key', 'data']) >> key data 0 A 0 1 …
ru111
  • 813
  • 3
  • 13
  • 27
13
votes
5 answers

pandas - how to get last n groups of a groupby object and combine them as a dataframe

How to get last 'n' groups after df.groupby() and combine them as a dataframe. data = pd.read_sql_query(sql=sqlstr, con=sql_conn, index_col='SampleTime') grouped = data.groupby(data.index.date,sort=False) After doing grouped.ngroups i am getting…
stockade
  • 277
  • 3
  • 10
13
votes
1 answer

Why Pandas gives AttributeError: 'SeriesGroupBy' object has no attribute 'pct'?

I'm trying to pass a user defined function pct to Pandas agg method, and it works if I only pass that function but it doesn't when I use the dictionary format for defining the functions. Does anyone know why? import pandas as pd df =…
Franco Piccolo
  • 6,845
  • 8
  • 34
  • 52
13
votes
3 answers

pandas groupby aggregate element-wise list addition

I have a pandas dataframe that looks as follows: X Y 71455 [334.0, 319.0, 298.0, 323.0] 71455 [3.0, 8.0, 13.0, 10.0] 57674 [54.0, 114.0, 124.0, 103.0] I want to perform an aggregate groupby that adds the lists…
Kermit754
  • 343
  • 5
  • 14