1

I have a dataframe, with duplicated index, which I'm trying to reindex according to rowname of another dataframe. However, due to the duplicated index problem it's telling me cannot reindex on an axis with duplicate labels when I do df2 = df2.reindex(df.index).

Below is what my df2 looks like

index column1 column2
A sample_a 1
B sample_a 3
B sample_b 2
C sample_c 3

So I want to make my index unique by adding prefix to it by values from column1. If column1 is sample_a, I want to make index as a_index; if column1 is sample_b, I want to change index as b_index, to change my df2 as below:

index column1 column2
a_A sample_a 1
a_B sample_a 3
b_B sample_b 2
c_C sample_c 3

I do see ways of adding prefix to index but I'm not sure how to selectively add different prefix?

Thank you!

Joy Zheng
  • 49
  • 6

2 Answers2

2

If your other DataFrame also has both "index" and "column1", why not use both as index?

Alternatively, you can de-duplicate "index" using str.extract to get the id from "column1" that is after the underscore:

df['index'] = (df['column1']
                 .str.extract('_([^_]+)$', expand=False)
                 .add('_'+df['index'])
              )

Updated dataframe:

  index   column1  column2
0   a_A  sample_a        1
1   a_B  sample_a        3
2   b_B  sample_b        2
3   c_C  sample_c        3

regex:

_        # match underscore
([^_]+)  # capture non-underscore characters
$        # match end of line

regex demo

mozway
  • 194,879
  • 13
  • 39
  • 75
1

You can use str.slice

df2['index'] = df2['column1'].str.slice(-1) + "_" + df2['index']

Output:

  index   column1  column2
0   a_A  sample_a        1
1   a_B  sample_a        3
2   b_B  sample_b        2
3   c_C  sample_c        3


As a side note:

You can also guarantee a unique index by using pandas dataframe .cumcount

Like this:

df2['index'] = df2['index'] + df2.groupby(['index'])['index'].cumcount().astype(str)

Which produces this kind of output:

  index   column1  column2
0    A0  sample_a        1
1    B0  sample_a        3
2    B1  sample_b        2
3    C0  sample_c        3
ScottC
  • 3,941
  • 1
  • 6
  • 20