4

I am trying to extract the timestamps from my inbox in order to generate some statistics with Pandas. My code grabs up to 1000 emails, and stores the timestamps in a list. I then pass the list to pd.DataFrame, which gives me a dataframe with a column of type "time".

I want to use groupby and TimeGrouper in order to plot the number of emails by weekday, time of day, etc., so I set my timestamp column as the index, but I get a TypeError: "Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex, but got an instance of 'Index'". I have tried using to_datetime, but that generates another TypeError: object of type 'time' has no len(). From what I can tell, df[0] is already a datetime object, so why does it throw an error when trying to use TimeGrouper?

import win32com.client
import pandas as pd
import numpy as np

outlook = win32com.client.Dispatch("Outlook.Application").GetNamespace("MAPI")
inbox = outlook.GetDefaultFolder(6)                              
messages = inbox.Items
message = messages.GetLast()
timesReceived = [message.SentOn]

for i in range(1000): 
    try:
        message = messages.GetPrevious()
        timesReceived.append(message.SentOn)
    except(AttributeError):
        break 

df = pd.DataFrame(timesReceived);
df.set_index(df[0],inplace=True)
grouped = df.groupby(pd.TimeGrouper('M'))


TypeError: Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex, but got an instance of 'Index'

Edit: Adding df.info() and df.head()

df.info()
<class 'pandas.core.frame.DataFrame'>
Index: 150 entries, 04/01/16 09:37:07 to 02/11/16 17:40:56
Data columns (total 1 columns):
0    150 non-null object
dtypes: object(1)
memory usage: 2.3+ KB

df.head()
    0
0   
04/01/16 09:37:07   04/01/16 09:37:07
04/01/16 04:34:30   04/01/16 04:34:30
04/01/16 03:02:14   04/01/16 03:02:14
04/01/16 02:15:12   04/01/16 02:15:12
04/01/16 00:16:27   04/01/16 00:16:27
Stefan
  • 41,759
  • 13
  • 76
  • 81
thobru
  • 43
  • 1
  • 5
  • Would you mind sharing the output of `df.info()` and `df.head()`? – Stefan Apr 01 '16 at 14:28
  • Sure, I've edited my post to include it. Thanks – thobru Apr 01 '16 at 14:58
  • `Index: 150 entries` suggests your `index` columns needs to be converted to `datetime` using `pd.to_datetime()` first. `df[0]` may look like `datetime` but needs type conversion, try `df[0] = pd.to_datetime(df[0], format='%m-%d-%Y %H:%M:%S')` before setting to index. – Stefan Apr 01 '16 at 14:59
  • @Stefan Thanks a lot. The following seems to have done the trick (changed the format string slightly): `df[0] = pd.to_datetime(df[0], format='%m/%d/%y %H:%M:%S')` `df.info()` now returns `DatetimeIndex: 150 entries`. Thanks for pointing that out. – thobru Apr 01 '16 at 15:33
  • alright, posted as answer so can mark this as solved. – Stefan Apr 01 '16 at 15:35

1 Answers1

1

Index: 150 entries suggests your index column needs to be converted to datetime using pd.to_datetime() first.

df[0] may look like datetime but needs type conversion, try

df[0] = pd.to_datetime(df[0], format='%m/%d/%Y %H:%M:%S') 

before setting to index.

Stefan
  • 41,759
  • 13
  • 76
  • 81