I have a dataframe that holds 2,865,044 entries with a 3-level MultiIndex
MultiIndex.levels.names = ['year', 'country', 'productcode']
I am trying to reshape the dataframe to produce a wide dataframe but I am getting the error:
ReshapeError: Index contains duplicate entries, cannot reshape
I have used:
data[data.duplicated()]
to identify the lines causing the error but the data that it lists doesn't seem to contain any duplicates.
This led me to export my dataframe using the to_csv() and opened the data in Stata and used the duplicates list command to find the dataset doesn't hold duplicates (according to stata).
An Example from the sorted csv file:
year country productcode duplicate
1962 MYS 711 FALSE
1962 MYS 712 TRUE
1962 MYS 721 FALSE
I know it's a long shot but ideas what might be causing this? The data types in each index column is ['year': int; 'country': str, 'productcode' :str]. Could it be how pandas defines the unique groups? Any better ways to list the offending index lines?
Update: I have tried resetting the index
temp = data.reset_index()
dup = temp[temp.duplicated(cols=['year', 'country', 'productcode'])]
and I get a completely different list!
year country productcode
1994 HKG 9710
1994 USA 9710
1995 HKG 9710
1995 USA 9710
Updated 2 [28JUNE2013]:
It appears to have been a strange memory issue during my IPython Session. This morning's fresh instance, seems to work fine and reshape the data without any adjustments to yesterday's code! I will debug further if the issue returns and let you know. Anyone know of a good debugger for IPython Sessions?