1

I have the following Pandas dataframe:

df.head()

Output

id  unplug_hourDateTime
0   2018-09-01 01:00:00+02:00
1   2018-03-01 01:00:00+02:00
2   2018-03-01 01:00:00+02:00
3   2018-04-01 01:00:00+02:00
4   2018-04-01 01:00:00+02:00

My objective is to build a calmap graph based on the occurrence of record every day, so I need a dataframe with the index in DatetimeIndex, TimedeltaIndex or PeriodIndex format.

I wrote the following:

df['unplug_Date']=df['unplug_hourDateTime'].map(lambda x : x.date())
df_calmap=df['unplug_Date'].value_counts().to_frame()
df_calmap.head()

Output

               unplug_Date
2018-09-20   16562
2018-09-13   16288
2018-09-19   16288
2018-09-12   16092
2018-09-27   16074

At first glance it looks what I was looking for but if I use the calmap package, and I execute calmap.calendarplot(df_calmap) I get the an error, which I supposed is due to the format of the index.

AttributeError: 'Index' object has no attribute 'year'

How can I force the dataframe to use the index column as DatetimeIndex? I have found this interesting answer but I can't understand how to use df = df.set_index(pd.DatetimeIndex(df['b'])) with the already existing index and not with a new column.

Nicolaesse
  • 2,554
  • 12
  • 46
  • 71

1 Answers1

2

The calmap documentation states that it will default sum per day, so you don't have to change your datetime field to a date field. Just change your unplug_hourDateTime column to a datetime index as follows. My example uses method chaining, which means everything is done in 1 go:

df_calmap = (df
    .assign(unplug_hourDateTime=pd.DatetimeIndex(df['unplug_hourDateTime']))
    .groupby('unplug_hourDateTime')
    .size()
    .to_frame('count')
)

calmap.calendarplot(df_calmap['count'])

Of course, you can also use Josh Friedlander's nice answer:

df.index = pd.DateTimeIndex(df.index)
Sander van den Oord
  • 10,986
  • 5
  • 51
  • 96