2

I have the below data table

A = [2, 3, 1, 2, 4, 1, 5, 3, 1, 7, 5]
B = [0, 1, 1, 1, 0, 0, 1, 1, 1, 0, 0]
df = pd.DataFrame({'A':A, 'B':B})

I'd like to calculate the average of column A when consecutive rows see column B equal to 1. All rows where column B equal to 0 are neglected and subsequently create a new dataframe like below:

Desired table

Thanks for your help!

bgeaibreyi
  • 67
  • 4

3 Answers3

1

Try:

m = (df.B != df.B.shift(1)).cumsum() * df.B
df_out = df.groupby(m[m > 0])["A"].mean().reset_index(drop=True).to_frame()
df_out["B"] = 1
print(df_out)

Prints:

   A  B
0  2  1
1  3  1
Andrej Kesely
  • 168,389
  • 15
  • 48
  • 91
1
df1 = df.groupby((df['B'].shift() != df['B']).cumsum()).mean().reset_index(drop=True)
df1 = df1[df1['B'] == 1].astype(int).reset_index(drop=True)
df1

Output

    A   B
0   2   1
1   3   1

Explanation

We are checking if each row's value of B is not equal to next value using pd.shift, if so then we are grouping those values and calculating its mean and assigning it to new dataframe df1.

Since we have mean of groups of all consecutive 0s and 1s, so we are then filtering only values of B==1.

Utsav
  • 5,572
  • 2
  • 29
  • 43
  • This approach works however if I changed mean to std it would convert all values in column B to 0 that I no longer can do any filtering. Any other thought? – bgeaibreyi Jul 01 '21 at 19:57
  • I ended up creating a data frame that has the all the statistics of column A and insert column B from the groupby.mean to make this work. – bgeaibreyi Jul 01 '21 at 22:35
1

Keywords: groupby, shift, mean


Code:

df_result=df.groupby((df['B'].shift(1,fill_value=0)!= df['B']).cumsum()).mean()
df_result=df_result[df_result['B']!=0]
df_result
     A    B
1  2.0  1.0
3  3.0  1.0

As you might noticed, you need first to determine the consecutive rows blocks having the same values. One way to do so is by shifting B one row and then comparing it with itself.

df['B_shifted']=df['B'].shift(1,fill_value=0) # fill_value=0 to return int and replace Nan with 0's
df['A']                     =[2, 3, 1, 2, 4, 1, 5, 3, 1, 7, 5]
df['B']                     =[0, 1, 1, 1, 0, 0, 1, 1, 1, 0, 0]
df['B_shifted']             =[0, 0, 1, 1, 1, 0, 0, 1, 1, 1, 0]
(df['B_shifted'] != df['B'])=[F, T, F, F, T, F, T, F, F, T, F]
                                [↑      ][↑   ][↑      ][↑   ] 

Now we can use the groupby pandas method as follows:

df_grouped=df.groupby((df['B_shifted'] != df['B']).cumsum())

Now if we looped in the DtaFrameGroupBy object df_grouped we'll see the following tuples:

(0,    A  B  B_shifted
0  2  0          0)
(1,    A  B  B_shifted
1  3  1          0
2  1  1          1
3  2  1          1)
(2,    A  B  B_shifted
4  4  0          1
5  1  0          0)
(3,    A  B  B_shifted
6  5  1          0
7  3  1          1
8  1  1          1)
(4,     A  B  B_shifted
9   7  0          1
10  5  0          0)
 

We can simply calculate the mean and filter the zero values now as follow

df_result=df_grouped.mean()
df_result=df_result[df_result['B']!=0][['A','B']]

References:(link, link).

mohamadmansourx
  • 361
  • 2
  • 8