4

I am trying to obtain daily averages from an irregular time series from a csv-file.

The data in the csv-file start at 13:00 on 20 September 2013 and run till 10:57 on 14 January 2014:

Time                    Values
20/09/2013 13:00        5.133540
20/09/2013 13:01        5.144993
20/09/2013 13:02        5.158208
20/09/2013 13:03        5.170542
20/09/2013 13:04        5.167899    
20/09/2013 13:25        5.168780
20/09/2013 13:26        5.179351
...

I import them with:

import pandas as pd
data = pd.read_csv('<file name>', parse_dates={'Timestamp':'Time']},index_col='Timestamp')

This results in

                           Values
Timestamp                          
2013-09-20 13:00:00        5.133540
2013-09-20 13:01:00        5.144993
2013-09-20 13:02:00        5.158208
2013-09-20 13:03:00        5.170542
2013-09-20 13:04:00        5.167899
2013-09-20 13:25:00        5.168780
2013-09-20 13:26:00        5.179351
...

And then I do

dataDailyAv = data.resample('D', how = 'mean')

This results in

                  Values
Timestamp                 
2013-01-10        8.623744
2013-01-11             NaN
2013-01-12             NaN
2013-01-13             NaN
2013-01-14             NaN
...

In other words, the result contains dates that do not appear in the original data, and for some of these dates (e.g. 10 January 2013), there even appears a value.

Any ideas about what is going wrong?

Thanks.

Edit: apparently something goes wrong with the parsing of the date: 01/10/2013 is interpreted as 10 January 2013 instead of 1 October 2013. This can be solved by editing the date format in the csv-file, but is there a way to specify the date format in read_csv?

Tim Stack
  • 3,209
  • 3
  • 18
  • 39
user2143353
  • 87
  • 1
  • 6
  • Looks worrying that you get data from January 2013 although your time series starts in September 2013. Is there an issue with your rawdata? Maybe something irregular, that you did not observe so far, because the file is so big? The docs for `resample` are a bit thin, unfortunately (http://pandas.pydata.org/pandas-docs/dev/generated/pandas.DataFrame.resample.html)... – Dr. Jan-Philip Gehrcke Jan 16 '14 at 13:51

1 Answers1

2

You want dayfirst=True, one of the many tweaks listed in the read_csv docs.

Dan Allan
  • 34,073
  • 6
  • 70
  • 63