3

I got ValueError: Columns index has to be unique for fixed format when I tried to save a dataframe which I formed by combining multiple time series dataframe. This is the sample of what I have done

df1=pd.concat([d1,d2,d3,d4],axis=1]
df2=pd.DataFrame(d5)
df3=pd.concat([d6,d7,d8],axis=1]

main_df=pd.concat([df1,df2,df3],axis=1)
main_df=main_df.dropna()
main_df.head()

till here it works fine but when I tried to save those data into a HDF5 file its giving me this error Columns index has to be unique for fixed format

fi=pd.read_hdf("data.h5")
fi['df']=main_df #this line cause the error
Eka
  • 14,170
  • 38
  • 128
  • 212
  • Do you need duplicate columns names? Simpliest solution is rename duplicates column names. – jezrael Jun 21 '17 at 06:42
  • Yes I have some columns with same name what is the best way to rename all duplicated columns. I have a very big dataset >30 columns – Eka Jun 21 '17 at 06:45

1 Answers1

3

You can use cumcount for count duplicates, replace 0 if necessary and add it to original columns names:

df = pd.DataFrame([[1,2,3,4]], columns = list('abbc'))
print (df)
   a  b  b  c
0  1  2  3  4

s = df.columns.to_series()
df.columns = s + s.groupby(s).cumcount().astype(str).replace({'0':''})
print (df)
   a  b  b1  c
0  1  2   3  4
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252