Columns index has to be unique for fixed format error in pandas time series

Question

I got ValueError: Columns index has to be unique for fixed format when I tried to save a dataframe which I formed by combining multiple time series dataframe. This is the sample of what I have done

df1=pd.concat([d1,d2,d3,d4],axis=1]
df2=pd.DataFrame(d5)
df3=pd.concat([d6,d7,d8],axis=1]

main_df=pd.concat([df1,df2,df3],axis=1)
main_df=main_df.dropna()
main_df.head()

till here it works fine but when I tried to save those data into a HDF5 file its giving me this error Columns index has to be unique for fixed format

fi=pd.read_hdf("data.h5")
fi['df']=main_df #this line cause the error

Do you need duplicate columns names? Simpliest solution is rename duplicates column names. — jezrael, Jun 21 '17 at 06:42
Yes I have some columns with same name what is the best way to rename all duplicated columns. I have a very big dataset >30 columns — Eka, Jun 21 '17 at 06:45

jezrael · Accepted Answer · 2017-06-21T06:51:59.893

3

You can use cumcount for count duplicates, replace 0 if necessary and add it to original columns names:

df = pd.DataFrame([[1,2,3,4]], columns = list('abbc'))
print (df)
   a  b  b  c
0  1  2  3  4

s = df.columns.to_series()
df.columns = s + s.groupby(s).cumcount().astype(str).replace({'0':''})
print (df)
   a  b  b1  c
0  1  2   3  4

edited Jun 21 '17 at 06:51

answered Jun 21 '17 at 06:46

jezrael

822,522
95
1,334
1,252

Columns index has to be unique for fixed format error in pandas time series

1 Answers1