I have the following pd.DataFrame called df:
date cluster_label value
0 2018-11-14 02:16:22 0 1.5
1 2018-11-14 02:16:22 0 7.0
2 2018-11-14 02:16:22 0 2.5
3 2018-11-14 02:16:22 1 3.0
4 2018-11-14 02:16:22 1 0.5
5 2018-11-14 02:16:22 2 1.0
I do the following command to convert the date column to only have month and year values, before setting the multi level index or the dataframe:
self.df['date'] = self.df['date'].dt.to_period('M')
self.df.set_index(['cluster_label', 'date'], inplace=True)
Now, the output is like so:
value
cluster_label date
0 2018-11 1.5
2018-11 7.0
2018-11 2.5
1 2018-11 3.0
2018-11 0.5
2 2018-11 1.0
But this is wrong. I want the output to not have duplicate indices for the date column. The output should look like:
value
cluster_label date
0 2018-11 1.5
7.0
2.5
1 2018-11 3.0
0.5
2 2018-11 1.0
What am I doing wrong and how can I change my code to get this desired output?