Retain KDB Long Epoch Time When Importing Into Pandas

Question

I have epoch times from KDB e.g. 530782044475144833. But any data type I have used in Pandas e.g. int, float64 severely round it up to 5.31e+17, rendering it useless. How do I retain the entire length of the KDB epoch time when reading into Pandas? Thanks in advance.

One solution could be to store only the offset from a baseline and not the entire epoch time. — João Almeida, Oct 31 '16 at 14:33
@chrisaycock -- Thanks Chris. Using np.int64 returns "TypeError: data type "np.int64" not understood". Using just int64 returns "ValueError: invalid literal for int() with base 10: b'5.31E+17'". — skafetaur, Oct 31 '16 at 16:27
@skafetaur I assume the "TypeError: data type "np.int64" not understood" is coming from whatever KDB adapter you're using. The adapter itself needs to be able to convert `j` types properly. — Manish Patel, Nov 01 '16 at 12:29

score 1 · Answer 1 · answered Jul 05 '17 at 17:49

In the recent (> 4.0) versions of PyQ, kdb+ timestamp vectors can be easily converted to numpy arrays. For example:

>>> p = q('2#.z.P')
>>> p
k('2017.07.05D13:37:41.058130000 2017.07.05D13:37:41.058130000')
>>> p.long
k('552577061058130000 552577061058130000')
>>> a = np.array(p)
>>> a
array(['2017-07-05T13:37:41.058130000', '2017-07-05T13:37:41.058130000'], dtype='datetime64[ns]')

This does an automatic epoch conversion and does not loose any precision

>>> p == a
True

The resulting array can be cast into a panda's DatetimeIndex:

>>> pd.DatetimeIndex(a)
DatetimeIndex(['2017-07-05 13:37:41.058130', '2017-07-05 13:37:41.058130'], dtype='datetime64[ns]', freq=None)

or placed in a DataFrame:

>>> pd.DataFrame({'timestamp':a})
                   timestamp
0 2017-07-05 13:37:41.058130
1 2017-07-05 13:37:41.058130

Retain KDB Long Epoch Time When Importing Into Pandas

1 Answers1