I am trying to obtain daily averages from an irregular time series from a csv-file.
The data in the csv-file start at 13:00 on 20 September 2013 and run till 10:57 on 14 January 2014:
Time Values
20/09/2013 13:00 5.133540
20/09/2013 13:01 5.144993
20/09/2013 13:02 5.158208
20/09/2013 13:03 5.170542
20/09/2013 13:04 5.167899
20/09/2013 13:25 5.168780
20/09/2013 13:26 5.179351
...
I import them with:
import pandas as pd
data = pd.read_csv('<file name>', parse_dates={'Timestamp':'Time']},index_col='Timestamp')
This results in
Values
Timestamp
2013-09-20 13:00:00 5.133540
2013-09-20 13:01:00 5.144993
2013-09-20 13:02:00 5.158208
2013-09-20 13:03:00 5.170542
2013-09-20 13:04:00 5.167899
2013-09-20 13:25:00 5.168780
2013-09-20 13:26:00 5.179351
...
And then I do
dataDailyAv = data.resample('D', how = 'mean')
This results in
Values
Timestamp
2013-01-10 8.623744
2013-01-11 NaN
2013-01-12 NaN
2013-01-13 NaN
2013-01-14 NaN
...
In other words, the result contains dates that do not appear in the original data, and for some of these dates (e.g. 10 January 2013), there even appears a value.
Any ideas about what is going wrong?
Thanks.
Edit: apparently something goes wrong with the parsing of the date: 01/10/2013 is interpreted as 10 January 2013 instead of 1 October 2013. This can be solved by editing the date format in the csv-file, but is there a way to specify the date format in read_csv?