0

I have a dataframe with 6000 rows and one column. I have to find the same element of the column two, but maximizing the distance between them. An example with a list, would be:

list = [2,1,3,1,2,4,5,1,3,2,1,5]

I would like the output to be the pair:

(list[1], list[10])

Any idea? Thank you guys!

Ch3steR
  • 20,090
  • 4
  • 28
  • 58

1 Answers1

0

You can try this. Use pd.Groupby.agg to the index and index the first element and last element.

lst = [2,1,3,1,2,4,5,1,3,2,1,5]
df = pd.DataFrame(lst,columns=['vals'])

df=df.reset_index().groupby('vals').agg(['first','last'])
df
     index
     first last
vals
1        1   10
2        0    9
3        2    8
4        5    5
5        6   11

#The above df has multiIndex column to make it single index column
df.columns=df.columns.get_level_values(1)
df
      first  last
vals
1         1    10
2         0     9
3         2     8
4         5     5
5         6    11

Or you can use pd.NamedAgg for Named Aggregation.

df.reset_index().groupby('vals').agg(
                     first_occurrence=pd.NamedAgg(column='index',aggfunc='first'),
                     last_occurrence=pd.NamedAgg(column='index',aggfunc='last')
                     )

      first_occurrence  last_occurrence
vals
1                   1              10
2                   0               9
3                   2               8
4                   5               5
5                   6              11

If you want them in same columns as a tuple use df.apply

df['occurances']=df.vals.apply(lambda x:(df.index[df.vals==x][0],
                                         df.index[df.vals==x][-1]))
df
    vals occurances
0      2     (0, 9)
1      1    (1, 10)
2      3     (2, 8)
3      1    (1, 10)
4      2     (0, 9)
5      4     (5, 5)
6      5    (6, 11)
7      1    (1, 10)
8      3     (2, 8)
9      2     (0, 9)
10     1    (1, 10)
11     5    (6, 11)
Ch3steR
  • 20,090
  • 4
  • 28
  • 58