Questions tagged [pandas-groupby]

To be used for grouping variables together based on a given condition. And only to be used with relevance to `pandas` library

pandas.DataFrame.groupby allows you to group variables in a DataFrame or a certain number of columns in different categories.

After grouping, one can also obtain the mean and perform other operations as well.

8780 questions
33
votes
3 answers

Boxplot with pandas groupby multiindex, for specified sublevels from multiindex

Ok so I have a dataframe which contains timeseries data that has a multiline index for each columns. Here is a sample of what the data looks like and it is in csv format. Loading the data is not an issue here. What I want to do is to be able to…
pbreach
  • 16,049
  • 27
  • 82
  • 120
32
votes
7 answers

Pandas groupby to to_csv

Want to output a Pandas groupby dataframe to CSV. Tried various StackOverflow solutions but they have not worked. Python 3.6.1, Pandas 0.20.1 groupby result looks like: id month year count week 0 9066 82 32142 895 1 …
kalmdown
  • 601
  • 1
  • 9
  • 13
32
votes
3 answers

Sorting the grouped data as per group size in Pandas

I have two columns in my dataset, col1 and col2. I want group the data as per col1 and then sort the data as per the size of each group. That is, I want to display groups in ascending order of their size. I have written the code for grouping and…
krackoder
  • 2,841
  • 7
  • 42
  • 51
31
votes
2 answers

Pandas groupby and aggregation output should include all the original columns (including the ones not aggregated on)

I have the following data frame and want to: Group records by month Sum QTY_SOLDand NET_AMT of each unique UPC_ID(per month) Include the rest of the columns as well in the resulting dataframe The way I thought I can do this is 1st: create a month…
user3871
  • 12,432
  • 33
  • 128
  • 268
29
votes
3 answers

Get only the first and last rows of each group with pandas

Iam newbie in python. I have huge a dataframe with millions of rows and id. my data looks like this: Time ID X Y 8:00 A 23 100 9:00 B 24 110 10:00 B 25 120 11:00 C 26 130 12:00 C 27 140 13:00 A 28 150 14:00 A …
Arief Hidayat
  • 937
  • 1
  • 8
  • 19
27
votes
4 answers

When is it appropriate to use df.value_counts() vs df.groupby('...').count()?

I've heard in Pandas there's often multiple ways to do the same thing, but I was wondering – If I'm trying to group data by a value within a specific column and count the number of items with that value, when does it make sense to use…
Ollie Khakwani
  • 754
  • 1
  • 8
  • 17
27
votes
4 answers

Pandas: groupby column A and make lists of tuples from other columns?

I would like to aggregate user transactions into lists in pandas. I can't figure out how to make a list comprised of more than one field. For example, df = pd.DataFrame({'user':[1,1,2,2,3], 'time':[20,10,11,18, 15], …
MrCartoonology
  • 1,997
  • 4
  • 22
  • 38
27
votes
3 answers

Python Pandas max value in a group as a new column

I am trying to calculate a new column which contains maximum values for each of several groups. I'm coming from a Stata background so I know the Stata code would be something like this: by group, sort: egen max = max(odds) For example: data =…
Vicki
  • 315
  • 1
  • 3
  • 6
26
votes
2 answers

Check if all elements in a group are equal using pandas GroupBy

Is there a pythonic way to group by a field and check if all elements of each resulting group have the same value? Sample data: datetime rating signal 0 2018-12-27 11:33:00 IG 0 1 2018-12-27 11:33:00 HY -1 2 …
Yuca
  • 6,010
  • 3
  • 22
  • 42
25
votes
5 answers

Pandas aggregate with dynamic column names

I have a script that generates a pandas data frame with a varying number of value columns. As an example, this df might be import pandas as pd df = pd.DataFrame({ 'group': ['A', 'A', 'A', 'B', 'B'], 'group_color' : ['green', 'green', 'green',…
MartijnVanAttekum
  • 1,405
  • 12
  • 20
25
votes
5 answers

concise way of flattening multiindex columns

Using more than 1 function in a groupby-aggregate results in a multi-index which I then want to flatten. example: df = pd.DataFrame( {'A': [1,1,1,2,2,2,3,3,3], 'B': np.random.random(9), 'C': np.random.random(9)} ) out =…
Haleemur Ali
  • 26,718
  • 5
  • 61
  • 85
24
votes
2 answers

pandas groupby where you get the max of one column and the min of another column

I have a dataframe as follows: user num1 num2 a 1 1 a 2 2 a 3 3 b 4 4 b 5 5 I want a dataframe which has the minimum from num1 for each user, and the maximum of num2 for each user.…
lhay86
  • 696
  • 2
  • 5
  • 18
23
votes
4 answers

Pandas groupby mean - into a dataframe?

Say my data looks like…
Craig
  • 1,929
  • 5
  • 30
  • 51
23
votes
2 answers

Groupby in python pandas: Fast Way

I want to improve the time of a groupby in python pandas. I have this code: df["Nbcontrats"] = df.groupby(['Client', 'Month'])['Contrat'].transform(len) The objective is to count how many contracts a client has in a month and add this information…
Náthali
  • 937
  • 2
  • 10
  • 22
22
votes
4 answers

Using Grouped Map Pandas UDFs with arguments

I want to use data.groupby.apply() to apply a function to each row of my Pyspark Dataframe per group. I used The Grouped Map Pandas UDFs. However I can't figure out how to add another argument to my function. I tried using the argument as a global…
Yasmine
  • 321
  • 1
  • 2
  • 4