I have epoch times from KDB e.g. 530782044475144833. But any data type I have used in Pandas e.g. int, float64 severely round it up to 5.31e+17, rendering it useless. How do I retain the entire length of the KDB epoch time when reading into Pandas? Thanks in advance.
Asked
Active
Viewed 291 times
1
-
One solution could be to store only the offset from a baseline and not the entire epoch time. – João Almeida Oct 31 '16 at 14:33
-
1Have you tried `np.int64`? – chrisaycock Oct 31 '16 at 14:54
-
@chrisaycock -- Thanks Chris. Using np.int64 returns "TypeError: data type "np.int64" not understood". Using just int64 returns "ValueError: invalid literal for int() with base 10: b'5.31E+17'". – skafetaur Oct 31 '16 at 16:27
-
Thanks Joao -- That seems to be a feasible workaround. – skafetaur Oct 31 '16 at 16:46
-
@skafetaur I assume the "TypeError: data type "np.int64" not understood" is coming from whatever KDB adapter you're using. The adapter itself needs to be able to convert `j` types properly. – Manish Patel Nov 01 '16 at 12:29
1 Answers
1
In the recent (> 4.0) versions of PyQ, kdb+ timestamp vectors can be easily converted to numpy arrays. For example:
>>> p = q('2#.z.P')
>>> p
k('2017.07.05D13:37:41.058130000 2017.07.05D13:37:41.058130000')
>>> p.long
k('552577061058130000 552577061058130000')
>>> a = np.array(p)
>>> a
array(['2017-07-05T13:37:41.058130000', '2017-07-05T13:37:41.058130000'], dtype='datetime64[ns]')
This does an automatic epoch conversion and does not loose any precision
>>> p == a
True
The resulting array can be cast into a panda's DatetimeIndex
:
>>> pd.DatetimeIndex(a)
DatetimeIndex(['2017-07-05 13:37:41.058130', '2017-07-05 13:37:41.058130'], dtype='datetime64[ns]', freq=None)
or placed in a DataFrame
:
>>> pd.DataFrame({'timestamp':a})
timestamp
0 2017-07-05 13:37:41.058130
1 2017-07-05 13:37:41.058130

Alexander Belopolsky
- 2,228
- 10
- 26