0

I have a csv file with two columns containing dates and 0 or 1 like so:

17/08/2012 07:47:16 0
17/08/2012 07:54:31 1
17/08/2012 08:02:31 0
17/08/2012 09:22:33 0
17/08/2012 09:58:05 0
17/08/2012 12:26:59 1
17/08/2012 20:56:00 0
18/08/2012 10:04:06 0
18/08/2012 10:42:52 0
20/08/2012 07:22:02 0
20/08/2012 07:54:28 0
20/08/2012 08:01:58 0
20/08/2012 08:16:31 1
20/08/2012 08:26:38 0
20/08/2012 08:55:19 1
20/08/2012 09:00:09 0 
20/08/2012 09:26:11 0
20/08/2012 09:50:10 0
20/08/2012 10:33:37 0
20/08/2012 10:39:13 0
20/08/2012 10:39:35 1
20/08/2012 11:15:07 1
20/08/2012 11:19:15 0
20/08/2012 11:21:01 0

I load this file into a DataFrame raw_data and then change the index to Timestamp :

ts_data=raw_data.set_index(pd.to_datetime(raw_data.when_created,dayfirst=True))

I then try to downsample the data using:

daily_conversions=ts_data.resample('D',how='sum')

It works for all days (there is more than 7 months of date, here I only include a subset) except one day where I get this output:

2012-08-20     NaN

This does not make sense as you can see from the data. The interesting part is that if I downsample using a higher frequency like 'h' I get correct results for that specific day. I get null-values for the hours that are not present 0 for the hours that are present but only have 0 and a correct sum for the hours that are present but are ==1. Any ideas please?

piterbarg
  • 8,089
  • 2
  • 6
  • 22
luckyfool
  • 1,653
  • 3
  • 14
  • 12
  • 1
    Are you sure it didn't return 2012-08-19 with the NaN value? There is no 8/19 data so that would make sense, and it's what I got when I ran the same code you posted. – bdiamante Apr 04 '13 at 22:05
  • thank you bdiamante, you helped me see what was wrong, i focused on looking at what was wrong with the 20th and did not see the 19th was missing. – luckyfool Apr 05 '13 at 06:30

1 Answers1

0

After a helpful comment from above i realised what was wrong. It is just a matter of labelling. So in reality the date that should return NaN is the 19th but the default setting is label='right' so it was showing as the 20th. When i add label='left' it works fine. Thanks

luckyfool
  • 1,653
  • 3
  • 14
  • 12