1

I have Multiindex dataframe and I want to reindex it. However, I get 'duplicate axis error'.

Product  Date            col1
A        September 2019     5
         October 2019       7
B        September 2019     2
         October 2019       4

How can I achieve output like this?

Product  Date            col1
A        January 2019      0
         February 2019     0
         March 2019        0
         April 2019        0
         May 2019          0
         June 2019         0
         July 2019         0
         August 2019       0
         September 2019    5
         October 2019      7
B        January 2019      0
         February 2019     0
         March 2019        0
         April 2019        0
         May 2019          0
         June 2019         0
         July 2019         0
         August 2019       0
         September 2019    2
         October 2019      4 

First I tried this:

nested_df = nested_df.reindex(annual_date_range, level = 1, fill_value = 0)

Secondly,

nested_df = nested_df.reset_index().set_index('Date')
nested_df  = nested_df.reindex(annual_date_range, fill_value = 0)
JuniorESE
  • 267
  • 1
  • 7

2 Answers2

0

You should do the following for each month:

df.loc[('A', 'January 2019'), :] = (0)
df.loc[('B', 'January 2019'), :] = (0)
0

Let df1 be your first data frame with non-zero values. The approach is to create another data frame df with zero values and merge both data frames to obtain the result.

dates = ['{month}-2019'.format(month=month) for month in range(1,9)]*2
length = int(len(dates)/2)
products = ['A']*length + ['B']*length
Col1 = [0]*len(dates)
df = pd.DataFrame({'Dates': dates, 'Products': products, 'Col1':Col1}).set_index(['Products','Dates'])

Now the MultiIndex is converted to datetime:

df.index.set_levels(pd.to_datetime(df.index.get_level_values(1)[:8]).strftime('%m-%Y'), level=1,inplace=True)

In df1 you have to do the same, i.e. change the datetime multiindex level to the same format:

df1.index.set_levels(pd.to_datetime(df1.index.get_level_values(1)[:2]).strftime('%m-%Y'), level=1,inplace=True)

I did it because otherwise (for example if datetimes are formatted like %B %y) the sorting of the MultiIndex by months goes wrong. Now it is sufficient to merge both data frames:

result = pd.concat([df1,df]).sort_values(['Products','Dates'])

The final move is to change the datetime format:

result.index.set_levels(levels = pd.to_datetime(result.index.get_level_values(1)[:10]).strftime('%B %Y'), level=1, inplace=True)
treskov
  • 328
  • 1
  • 4
  • 17