0

I'm trying to create a way to search through a column and if it's a duplicate value, replace a string. This is what I have so far

date
November 1st 2020
November 2nd 2020
November 3rd 2020
November 1st 2020
November 2nd 2020
November 3rd 2020
November 1st 2020
November 2nd 2020
November 3rd 2020

And what I'm trying to do is this

date
November 1st 2020 first instance
November 2nd 2020 first instance
November 3rd 2020 first instance
November 1st 2020 second instance
November 2nd 2020 second instance
November 3rd 2020 second instance
November 1st 2020 third instance
November 2nd 2020 third instance
November 3rd 2020 third instance

Is there a way to do this? Replace duplicate values across columns in Pandas This will only add "1 character" to a dataframe, so would you create a loop in order to keep adding 1 or what ever you would want to it? Something like

pd.Series(['date']).duplicated()
is_duplicate = df.apply(pd.Series.duplicated, axis=1)
for is_duplicate
   df.where(~is_duplicate, +1)

I don't really understand how to iterate through the columns to get the desired result.

SpindriftSeltzer
  • 322
  • 3
  • 12

1 Answers1

2

IIUC, you can do it using groupby with cumcount and map.

df['date'] += ' ' + df.groupby('date').cumcount().map({0:'first ', 1:'second ', 2: 'third '}) + 'instance'

Output:

                                date
0   November 1st 2020 first instance
1   November 2nd 2020 first instance
2   November 3rd 2020 first instance
3  November 1st 2020 second instance
4  November 2nd 2020 second instance
5  November 3rd 2020 second instance
6   November 1st 2020 third instance
7   November 2nd 2020 third instance
8   November 3rd 2020 third instance
Scott Boston
  • 147,308
  • 15
  • 139
  • 187