How to get rows when specific column value is continous for certain number of rows

Question

I want to extract rows when the column x value remains the same for more than five consecutive rows.

     x  x2
0    5   5
1    4   5
2   10   6
3   10   5
4   10   6
5   10  78
6   10  89
7   10  78
8   10  98
9   10   8
10  10  56
11  60  45
12  10  65

Desired_output:

    x  x2
0  10   6
1  10   5
2  10   6
3  10  78
4  10  89
5  10  78
6  10  98
7  10   8
8  10  56

anky · Answer 1 · 2021-01-31T08:36:57.650

you can use shift to compare the next row and then take cumulative sum to compare if the repeat is greater than 5, then group on x and transform any then mask with the condition to unselect rows where condition does not match.

c = df['x'].eq(df['x'].shift())
out = df[c.cumsum().gt(5).groupby(df['x']).transform('any') & (c|c.shift(-1))]

Shubham Sharma · Accepted Answer · 2021-01-31T08:43:49.163

2

You can use .shift + .cumsum to identify the blocks of consecutive rows where column x value remains same, then group the dataframe on these blocks and transform using count to identify the groups which have greater than 5 consecutive same values in x:

b = df['x'].ne(df['x'].shift()).cumsum()
df_out = df[df['x'].groupby(b).transform('count').gt(5)]

Details:

>>> b
0     1
1     2
2     3
3     3
4     3
5     3
6     3
7     3
8     3
9     3
10    3
11    4
12    5
Name: x, dtype: int64

>>> df_out
     x  x2
2   10   6
3   10   5
4   10   6
5   10  78
6   10  89
7   10  78
8   10  98
9   10   8
10  10  56

edited Jan 31 '21 at 08:43

answered Jan 31 '21 at 08:35

Shubham Sharma

68,127
6
24
53

i only want to get consecutive rows when value of x is 10 only , if other values have consecutive rows i dont need – Nickel Jan 31 '21 at 08:46
In that case you can use `df[df['x'].groupby(b).transform('count').gt(5) & df['x'].eq(10)]` – Shubham Sharma Jan 31 '21 at 08:48

How to get rows when specific column value is continous for certain number of rows

2 Answers2