1

I have a minute dataset from: 01.01.2017 00:00:00.000 to 06.10.2017 23:59:00.000

It looks like this:

Gmt time, Open, Close
01.01.2017 00:00:00.000, 1.05148, 1.05153
01.01.2017 00:01:00.000, 1.05148, 1.05153
01.01.2017 00:02:00.000, 1.05148, 1.05153
...., ...., ....
01.01.2017 23:58:00.000, 1.05148, 1.05153
19.06.2017 23:59:00.000, 1.05148, 1.05153

now I do (sort the data):

df = df.sort_values('Gmt time')

and then I do:

df['Gmt time'] = pd.DatetimeIndex(df['Gmt time'])

and then do:

df['Gmt time'].describe()

I get the wrong first and last value. Not only that, it looks like pd.DatetimeIndex miscalculated it.

The new first and last values are:

first     2017-01-01 00:00:00
last      2017-12-06 23:59:00

but my last value was supposed to be 2017-06-19 23:59:00

what could have caused this to happen?

TEST CASES:

I want to add some test cases which may help. If the dataset is a complete Year from: 01.01.2017 00:00:00.000 to 12.31.2017 23:59:00.000 then I have correct values. This dataset is a full year minute data.

floss
  • 2,603
  • 2
  • 20
  • 37
  • 4
    Did you parse it with the `dayfirst` parameter? That defaults to False, but it looks True in your dataset – G. Anderson Jul 11 '19 at 21:08
  • @G.Anderson could you please explain it a bit more? what do you mean by `dayfirst` ? here `Gmt time` is the `datetime` timestamp in exactly minute. – floss Jul 11 '19 at 21:21
  • 2
    According to [the docs](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DatetimeIndex.html): dayfirst : bool, default False ; If True, parse dates in data with the day first order". So try setting that parameter to true since the dates you're parsing are in day-first format (`dd.mm.yyyy` instead of `mm.dd.yyyy`), as follows `pd.DatetimeIndex(df['Gmt time'], dayfirst=True)` and see if that helps – G. Anderson Jul 11 '19 at 21:27

0 Answers0