Questions tagged [pandas-groupby]

To be used for grouping variables together based on a given condition. And only to be used with relevance to `pandas` library

pandas.DataFrame.groupby allows you to group variables in a DataFrame or a certain number of columns in different categories.

After grouping, one can also obtain the mean and perform other operations as well.

8780 questions
18
votes
2 answers

df.groupby(...).agg(set) produces different result compared to df.groupby(...).agg(lambda x: set(x))

Answering this question it turned out that df.groupby(...).agg(set) and df.groupby(...).agg(lambda x: set(x)) are producing different results. Data: df = pd.DataFrame({ 'user_id': [1, 2, 3, 4, 1, 2, 3], 'class_type': ['Krav Maga',…
MaxU - stand with Ukraine
  • 205,989
  • 36
  • 386
  • 419
18
votes
2 answers

TypeError: unhashable type: 'list' when use groupby in python

There is something wrong when I use groupby method: data = pd.Series(np.random.randn(100),index=pd.date_range('01/01/2001',periods=100)) keys = lambda x: [x.year,x.month] data.groupby(keys).mean() but it has an error: TypeError: unhashable type:…
littlely
  • 1,368
  • 3
  • 18
  • 36
18
votes
3 answers

Time difference within group by objects in Python Pandas

I have a dataframe that looks like this: from to datetime other ------------------------------------------------- 11 1 2016-11-06 22:00:00 - 11 1 2016-11-06 20:00:00 - 11 1 …
Gingerbread
  • 1,938
  • 8
  • 22
  • 36
18
votes
2 answers

Bar graph from dataframe groupby

import pandas as pd import numpy as np import matplotlib.pyplot as plt df = pd.read_csv("arrests.csv") df = df.replace(np.nan,0) df = df.groupby(['home_team'])['arrests'].mean() I'm trying to create a bar graph for dataframe. Under home_team are a…
jhaywoo8
  • 767
  • 5
  • 13
  • 23
18
votes
2 answers

Groupby, transpose and append in Pandas?

I have a dataframe which looks like this: Each user has 10 records. Now, I want to create a dataframe which looks like this: userid name1 name2 ... name10 which means I need to invert every 10 records of the column name and append to a new…
Dawny33
  • 10,543
  • 21
  • 82
  • 134
18
votes
2 answers

Pandas: plot multiple time series DataFrame into a single plot

I have the following pandas DataFrame: time Group blocks 0 1 A 4 1 2 A 7 2 3 A 12 3 4 A 17 4 5 A 21 5 6 A …
ShanZhengYang
  • 16,511
  • 49
  • 132
  • 234
17
votes
2 answers

Pandas GroupBy.agg() throws TypeError: aggregate() missing 1 required positional argument: 'arg'

I’m trying to create multiple aggregations of the same field. I’m working in pandas, in python3.7. The syntax seems pretty straightforward based on the…
user3476463
  • 3,967
  • 22
  • 57
  • 117
17
votes
4 answers

why does pandas rolling use single dimension ndarray

I was motivated to use pandas rolling feature to perform a rolling multi-factor regression (This question is NOT about rolling multi-factor regression). I expected that I'd be able to use apply after a df.rolling(2) and take the resulting…
piRSquared
  • 285,575
  • 57
  • 475
  • 624
16
votes
2 answers

Pandas groupby apply vs transform with specific functions

I don't understand which functions are acceptable for groupby + transform operations. Often, I end up just guessing, testing, reverting until something works, but I feel there should be a systematic way of determining whether a solution will…
jpp
  • 159,742
  • 34
  • 281
  • 339
16
votes
2 answers

DataFrame: add column with the size of a group

I have the following dataframe: fsq digits digits_type 0 1 1 odd 1 2 1 odd 2 3 1 odd 3 11 2 even 4 22 2 even 5 101 3 odd 6 111 3 odd and I want to add a last column, count,…
luffe
  • 1,588
  • 3
  • 21
  • 32
15
votes
3 answers

Group pandas dataframe in unusual way

Problem I have the following Pandas dataframe: data = { 'ID': [100, 100, 100, 100, 200, 200, 200, 200, 200, 300, 300, 300, 300, 300], 'value': [False, False, True, False, False, True, True, True, False, False, False, True, True,…
Ford1892
  • 741
  • 2
  • 9
  • 20
15
votes
5 answers

What are 25%,50%,75% values when we describe a grouped dataframe?

I am going through pandas groupby docs and when I groupby on particular column as below: df: A B C D 0 foo one -0.987674 0.039616 1 bar one -0.653247 -1.022529 2 foo two 0.404201 1.308777 3 bar three …
KcH
  • 3,302
  • 3
  • 21
  • 46
15
votes
5 answers

Filter a data-frame and add a new column according to the given condition

I have a data frame like this ID col1 col2 1 Abc street 2017-07-27 1 None 2017-08-17 1 Def street 2018-07-15 1 None 2018-08-13 2 fbg street 2018-01-07 2 None …
No_body
  • 832
  • 6
  • 21
15
votes
4 answers

Filter groups after GroupBy in pandas while keeping the groups

in pandas I want to do: df.groupby('A').filter(lambda x: x.name > 0) - group by column A and then filter groups that have the value of the name non positive. However this canceles the grouping as GroupBy.filter returns DataFrame and thus losing the…
Péťa Poliak
  • 393
  • 1
  • 3
  • 11
15
votes
6 answers

How to sum a column grouped by other columns in a list?

I have a list as follows. [['Andrew', '1', '9'], ['Peter', '1', '10'], ['Andrew', '1', '8'], ['Peter', '1', '11'], ['Sam', '4', '9'], ['Andrew', '2', '2']] I would like sum up the last column grouped by the other columns.The result is like…
Deepleeqe
  • 317
  • 1
  • 8