2

I have a dataframe like this:

    seq                         score
0   TAAGAATTGTTCTCTGTGTATTT     -23.19
1   AAGAATTGTTCTCTGTGTATTTC     -3.67
2   AGAATTGTTCTCTGTGTATTTCA     -16.49
3   GAATTGTTCTCTGTGTATTTCAG     -11.83
4   AATTGTTCTCTGTGTATTTCAGG     -10.86
5   ATTGTTCTCTGTGTATTTCAGGC     -7.24

I want to select 3 rows in a loop and then get maximum value of the score.

The result I am looking for is like this:


    seq                          score
1   AAGAATTGTTCTCTGTGTATTTC     -3.67
5   ATTGTTCTCTGTGTATTTCAGGC     -7.24

I tried applying groupby function and sort, but it does not seem to work as the seq column has unique values.

What other method can I use to get such result?

jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
rshar
  • 1,381
  • 10
  • 28
  • Hi Ranu! Could you please reformat the data in your question to make it more legible for us? As of now it is hard to tell what belongs in which column. – Joe Nov 26 '19 at 08:40
  • Also, is this a limited pattern, or do you want to calculate it with a shifting POV? Like: you have 15 blocks of Pattern = (digit, string, float) and you want to always grab blocks 0-2, 1-3,2-4,3-5 etc. or just blocks 0-2, 3-5, 6-8 etc? – tst Nov 26 '19 at 08:43

2 Answers2

2

Use DataFrameGroupBy.idxmax for index of max value per groups created by integer division of index by 3 and then seelct rows by DataFrame.loc:

df = df.loc[df.groupby(df.index // 3)['score'].idxmax()]
print (df)
                       seq  score
1  AAGAATTGTTCTCTGTGTATTTC  -3.67
5  ATTGTTCTCTGTGTATTTCAGGC  -7.24

Details:

print (df.index // 3)
Int64Index([0, 0, 0, 1, 1, 1], dtype='int64')

print (df.groupby(df.index // 3)['score'].idxmax())
0    1
1    5
Name: score, dtype: int64
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
2
import pandas as pd

df = pd.DataFrame({'seq':['TAAGAATTGTTCTCTGTGTATTT','AAGAATTGTTCTCTGTGTATTTC','AGAATTGTTCTCTGTGTATTTCA','GAATTGTTCTCTGTGTATTTCAG','AATTGTTCTCTGTGTATTTCAGG','ATTGTTCTCTGTGTATTTCAGGC'],
                   'score': [-23.19,-3.67,-16.49,-11.83,-10.86,-7.24]})
df = df.loc[df.groupby(df.index // 3)['score'].idxmax()]
print(df)
Chrisvdberge
  • 1,824
  • 6
  • 24
  • 46