Questions tagged [pandas-groupby]

To be used for grouping variables together based on a given condition. And only to be used with relevance to `pandas` library

pandas.DataFrame.groupby allows you to group variables in a DataFrame or a certain number of columns in different categories.

After grouping, one can also obtain the mean and perform other operations as well.

8780 questions
2
votes
2 answers

Create Dummy Columns for values in Single Pandas Column and Group into single row

I am trying to take a pandas dataframe and perform a pivot like operation on a single column. I want to take multiple rows (grouped by some identification columns) and convert that single column into dummy indicator variables. I know of…
Coldchain9
  • 1,373
  • 11
  • 31
2
votes
2 answers

Python Pandas: Delete duplicate rows based on one column and concatenate information from multiple columns

I have a pandas dataframe that contains duplicates according to one column (ID), but has differing values in several other columns. My goal is to remove the duplicates based on ID, but to concatenate the information from the other columns. Here is…
Anna
  • 43
  • 6
2
votes
2 answers

Transform each group in a DataFrame

I have the following DataFrame: id x y timestamp sensorTime 1 32 30 1031 2002 1 4 105 1035 2005 1 8 110 1050 2006 2 18 10 1500 3600 2 40 20 1550…
machinery
  • 5,972
  • 12
  • 67
  • 118
2
votes
1 answer

ngroup with multiple group by python

Given a DataFrame, I would like the group number of the values in one column id1, within each group of a second column id2. I tried ngroup() to identify unique number groups by id1 and id2. Here is an example df: id1 id2 0 1123 123 1 1123 …
shbrn
  • 33
  • 2
2
votes
2 answers

How to get the cumulative count based on two columns

Let's say we have the following dataframe. If we wanted to find the count of consecutive 1's, you could use the below. col 0 0 1 1 2 1 3 1 4 0 5 0 6 1 7 1 8 0 9 1 10 1 11 1 12 1 13 0 14 1 15 …
rhug123
  • 7,893
  • 1
  • 9
  • 24
2
votes
2 answers

How do you change input parameters of pandas groupby.agg function?

I am having issues using the groupby_object.agg() method with functions where I want to change the input parameters. Is there a resource available of function names .agg() accepts, and how to pass parameters to them? See an example below: import…
bkeesey
  • 466
  • 4
  • 12
2
votes
1 answer

Pass arrays from DatafFame into function with arrays grouped and flattened

I have a dataframe with X position data for hundreds of participants, and three grouping variables (with each participant's X data being 1000 points in length). Preview of dataframe: X Z participantNum obsScenario startPos …
2
votes
2 answers

How to get rows when specific column value is continous for certain number of rows

I want to extract rows when the column x value remains the same for more than five consecutive rows. x x2 0 5 5 1 4 5 2 10 6 3 10 5 4 10 6 5 10 78 6 10 89 7 10 78 8 10 98 9 10 8 10 10 56 11 60 45 12 …
Nickel
  • 580
  • 4
  • 19
2
votes
3 answers

Add Sum to all grouped rows in pandas dataframe

I have a dataframe and i want to group its "First" and "Second" column and then to produce the expected output as mentioned below: df = pd.DataFrame({'First':list('abcababcbc'),…
2
votes
1 answer

Vectorized way to store group name (from groupby) into a new column of the original DataFrame?

Having a DataFrame with a timestamp column, thanks to groupby, pd.Grouper and a for loop, I am able to group rows by periods and keep track of the group label in the original DataFrame. For instance, considering following DataFrame, and periods of 2…
pierre_j
  • 895
  • 2
  • 11
  • 26
2
votes
2 answers

Groupby into list for non consecutive values

I am trying to group by this dataset col1 col2 0 A 1 1 B 1 2 C 1 3 D 3 4 E 3 5 F 2 6 G 2 7 H 1 8 I 1 9 j 2 10 K 2 into this 1 : [A, B, C] 3: [D, E] 2: [ F; G] 1: [ H, I] 2: [ J,K] so it has to…
ombk
  • 2,036
  • 1
  • 4
  • 16
2
votes
2 answers

Python pandas - group by column A and prevent duplicated existence on column b?

Suppose I have a dataframe df = pd.DataFrame({"SKU": ["Coke", "Coke", "Coke", "Bread", "Bread", "Bread", "cake", "cake", "cake"], "campaign":["buy1get1","$19", "event", "buy1get1","$19", "event", "buy1get1","$19", "event"], …
Leigh Tsai
  • 297
  • 6
  • 20
2
votes
1 answer

groupby filter for accounts where monthly balance are all negative

Here is my sample data - import pandas as pd df = pd.DataFrame({'Account': ['A1', 'A1', 'A1', 'A1', 'A2', 'A2', 'A2', 'A2', 'A3', 'A3', 'A3', 'A3'], 'Date': ['D1', 'D2', 'D3', 'D4', 'D1', 'D2', 'D3', 'D4', 'D1', 'D2', 'D3',…
Vivek Kalyanarangan
  • 8,951
  • 1
  • 23
  • 42
2
votes
2 answers

pandas groupby time of day with 15 minute bins

I have some time series data that spans multiple days, like so: dr = pd.date_range('01-01-2020 9:00', '01-03-2020 23:59', freq='1T') df = pd.DataFrame({'data': 1}, index=dr) # all ones in the data column I am interested in grouping by the time of…
Tom
  • 8,310
  • 2
  • 16
  • 36
2
votes
1 answer

Sum values of columns that start with the same text string

I want to take the sum of values (row-wise) of columns that start with the same text string. Underneath is my original df with fails on courses. Original df: ID P_English_2 P_English_3 P_German_1 P_Math_1 P_Math_3 P_Physics_2 P_Physics_4 56 …
Matthi9000
  • 1,156
  • 3
  • 16
  • 32
1 2 3
99
100