pandas DataFrame reset_index which can handle duplicate column names?

Question

Is there any equivalent of pandas.DataFrame.reset_index() which operates on the columns and can handle the case of duplicate column names? I want it to throw away the column names and return a default numbered index 0,1,2.. for the columns. (Methods like df.rename or df.reindex_axis do not work when I have duplicate column names.)

Sample input:

 pd.DataFrame(np.random.rand(5, 3), columns = ['A', 'A', 'B'])

     A   A   B
0   0.5 0.3 0.9
1   0.7 0.9 0.3
2   0.9 0.4 0.8
3   0.6 0.2 0.9
4   0.7 0.4 0.6

Expected output:

     0   1   2
0   0.8 0.1 0.2
1   0.4 0.2 0.4
2   0.3 0.3 0.4
3   0.4 0.1 0.8
4   1.0 0.9 0.9

What is your use case for having duplicate column names? That's generally a very bad idea. You can just strip the column names at the start, save them and add them back at the end if needed. — smci, Jul 18 '19 at 23:49
@smci I know it is a very bad idea, this is why I want to reset the index :) This use case came from concatenating columns from different multiindex dataframes — FLab, Jul 19 '19 at 08:35
Then **you should do a [`join`/`merge`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.join.html#pandas.DataFrame.join) on the dataframes**. Not a simple `concat`. — smci, Jul 19 '19 at 19:15

jezrael · Answer 1 · 2016-07-21T10:52:48.667

5

Use range with length of columns by shape:

df.columns = range(df.shape[1])
print (df)
          0         1         2
0  0.228080  0.884450  0.753401
1  0.176790  0.741979  0.525305
2  0.680255  0.730258  0.449681
3  0.169420  0.660825  0.986554
4  0.302204  0.040413  0.902899

Another solution with double transposing by T and reset_index with parameter drop=True:

df = df.T.reset_index(drop=True).T
print (df)
          0         1         2
0  0.024846  0.688193  0.887926
1  0.284681  0.895319  0.142876
2  0.440834  0.299527  0.762815
3  0.936967  0.928907  0.642960
4  0.801077  0.085773  0.866651

edited Jul 21 '16 at 10:52

answered Jul 21 '16 at 10:44

jezrael

822,522
95
1,334
1,252

As written in the question, I want to avoid assigning new values to columns. In particular, I want to do this operation in the context of a dictionary comprehension, where I create the dataframe by concatenating time series and then changing the name of the columns. – FLab Jul 21 '16 at 10:45
Ok, then use second solution. Unfortunately `reset_index` doesnt work with columns, so need double transposing. – jezrael Jul 21 '16 at 10:54

score 5 · Accepted Answer · answered Jul 21 '16 at 11:14

you can use set_axis() method:

In [54]: df
Out[54]:
          A         A         B
0  0.934900  0.817182  0.166270
1  0.064543  0.139431  0.249576
2  0.709349  0.731913  0.965048
3  0.284955  0.479898  0.496652
4  0.520749  0.464256  0.999993

In [55]: df.set_axis(1, range(len(df.columns)))

In [56]: df
Out[56]:
          0         1         2
0  0.934900  0.817182  0.166270
1  0.064543  0.139431  0.249576
2  0.709349  0.731913  0.965048
3  0.284955  0.479898  0.496652
4  0.520749  0.464256  0.999993

pandas DataFrame reset_index which can handle duplicate column names?

2 Answers2