2

I want to extract rows when the column x value remains the same for more than five consecutive rows.

     x  x2
0    5   5
1    4   5
2   10   6
3   10   5
4   10   6
5   10  78
6   10  89
7   10  78
8   10  98
9   10   8
10  10  56
11  60  45
12  10  65

Desired_output:

    x  x2
0  10   6
1  10   5
2  10   6
3  10  78
4  10  89
5  10  78
6  10  98
7  10   8
8  10  56
Mykola Zotko
  • 15,583
  • 3
  • 71
  • 73
Nickel
  • 580
  • 4
  • 19

2 Answers2

2

you can use shift to compare the next row and then take cumulative sum to compare if the repeat is greater than 5, then group on x and transform any then mask with the condition to unselect rows where condition does not match.

c = df['x'].eq(df['x'].shift())
out = df[c.cumsum().gt(5).groupby(df['x']).transform('any') & (c|c.shift(-1))]

print(out)

     x  x2
2   10   6
3   10   5
4   10   6
5   10  78
6   10  89
7   10  78
8   10  98
9   10   8
10  10  56
anky
  • 74,114
  • 11
  • 41
  • 70
2

You can use .shift + .cumsum to identify the blocks of consecutive rows where column x value remains same, then group the dataframe on these blocks and transform using count to identify the groups which have greater than 5 consecutive same values in x:

b = df['x'].ne(df['x'].shift()).cumsum()
df_out = df[df['x'].groupby(b).transform('count').gt(5)]

Details:

>>> b
0     1
1     2
2     3
3     3
4     3
5     3
6     3
7     3
8     3
9     3
10    3
11    4
12    5
Name: x, dtype: int64

>>> df_out
     x  x2
2   10   6
3   10   5
4   10   6
5   10  78
6   10  89
7   10  78
8   10  98
9   10   8
10  10  56
Shubham Sharma
  • 68,127
  • 6
  • 24
  • 53