3

I have a dataframe and want to convert it into a numpy array to plot its values. The dataframe looks like this:

>>> df_ohlc
                        open       high        low      close
Date                                                           
2018-03-07 03:35:00  62.189999  62.189999  62.169998  62.180000
2018-03-07 03:36:00  62.180000  62.180000  62.160000  62.180000
2018-03-07 03:37:00  62.169998  62.220001  62.169998  62.209999
2018-03-07 03:38:00  62.220001  62.220001  62.189999  62.200001
...
[480 rows x 4 columns]

>>> df_ohlc.index
DatetimeIndex(['2018-03-07 03:35:00', '2018-03-07 03:36:00',
            '2018-03-07 03:37:00', '2018-03-07 03:38:00',
            '2018-03-07 03:39:00', '2018-03-07 03:40:00',
            '2018-03-07 03:41:00', '2018-03-07 03:42:00',
            '2018-03-07 03:43:00', '2018-03-07 03:44:00',
            ...
            '2018-03-07 11:25:00', '2018-03-07 11:26:00',
            '2018-03-07 11:27:00', '2018-03-07 11:28:00',
            '2018-03-07 11:29:00', '2018-03-07 11:30:00',
            '2018-03-07 11:31:00', '2018-03-07 11:32:00',
            '2018-03-07 11:33:00', '2018-03-07 11:34:00'],
            dtype='datetime64[ns]', name='Date', length=480, freq='T')

>>> df_ohlc.index[0]
Timestamp('2018-03-07 03:35:00', freq='T')  # and why is it Timestamp when it said ```dtype=datetime64[ns]```` right before?

But when I try to convert it, the index type(Date column) changes from datetime64[ns] to Timestamp.

>>> df_ohlc.reset_index().values
array([[Timestamp('2018-03-07 03:35:00'), 62.189998626708984,
        62.189998626708984, 62.16999816894531, 62.18000030517578],
    [Timestamp('2018-03-07 03:36:00'), 62.18000030517578,
        62.18000030517578, 62.15999984741211, 62.18000030517578],
    [Timestamp('2018-03-07 03:37:00'), 62.16999816894531,
        62.220001220703125, 62.16999816894531, 62.209999084472656],
    ..., 
    [Timestamp('2018-03-07 11:32:00'), 61.939998626708984,
        61.95000076293945, 61.93000030517578, 61.93000030517578],
    [Timestamp('2018-03-07 11:33:00'), 61.93000030517578,
        61.939998626708984, 61.900001525878906, 61.90999984741211],
    [Timestamp('2018-03-07 11:34:00'), 61.90999984741211,
        61.91999816894531, 61.900001525878906, 61.91999816894531]], dtype=object)

Why does it happen and how can I keep the type as datetime64?

I tried seperating the dataframe's index and concatenating it with the values afterwards, but it shows an error. I'd like to know what I did wrong.

>>> index_ohlc = np.array([ df_ohlc.index.values.astype('datetime64[s]'), ]).T

>>> index_ohlc.shape
(480, 1)

>>> value_ohlc = df_ohlc.values     

>>> value_ohlc.shape
(480, 4)

>>> type(index_ohlc)
<class 'numpy.ndarray'>

>>> type(value_ohlc)
<class 'numpy.ndarray'>

>>> new_array = np.concatenate( (index_ohlc, value_ohlc), axis = 1 )
Traceback (most recent call last):
File "<console>", line 1, in <module>
TypeError: invalid type promotion
John Zwinck
  • 239,568
  • 38
  • 324
  • 436
maynull
  • 1,936
  • 4
  • 26
  • 46
  • 1
    As long as your array has mixed types (datetime as well as float), then its dtype isn't going to be anything other than objects. I'd recommend taking the index out separately from the values. – cs95 Mar 07 '18 at 10:58
  • @cᴏʟᴅsᴘᴇᴇᴅ Thank you for your advice. I think I had tried what you said and got a ```TypeError```. Do you happen to know what caused it? – maynull Mar 07 '18 at 11:08
  • 1
    No, I don't have the code that produces that error... – cs95 Mar 07 '18 at 11:09
  • 1
    I did you a favor and deleted the second unrelated question after your first question. You should feel free to post it as a separate topic (it's actually easier to answer than the first question). – John Zwinck Mar 07 '18 at 12:41
  • @John Zwinck Thank you! – maynull Mar 09 '18 at 00:16

1 Answers1

1

Try structured_arrays.

Demo

from pandas import Timestamp
df = pd.DataFrame(np.array([[Timestamp('2018-03-07 03:35:00'), 62.189998626708984,
        62.189998626708984, 62.16999816894531, 62.18000030517578],
    [Timestamp('2018-03-07 03:36:00'), 62.18000030517578,
        62.18000030517578, 62.15999984741211, 62.18000030517578],
    [Timestamp('2018-03-07 03:37:00'), 62.16999816894531,
        62.220001220703125, 62.16999816894531, 62.209999084472656]]))
dt = np.dtype([("Date", 'datetime64[ns]'), 
               ("f1", np.float64), 
               ("f2", np.float64), 
               ("f3", np.float64), 
               ("f4", np.float64)])
arr = np.array([tuple(v) for v in df.values.tolist()], dtype=dt)

array([('2018-03-07T03:35:00.000000000', 62.18999863, 62.18999863, 62.16999817, 62.18000031),
       ('2018-03-07T03:36:00.000000000', 62.18000031, 62.18000031, 62.15999985, 62.18000031),
       ('2018-03-07T03:37:00.000000000', 62.16999817, 62.22000122, 62.16999817, 62.20999908)],
      dtype=[('Date', '<M8[ns]'), ('f1', '<f8'), ('f2', '<f8'), ('f3', '<f8'), ('f4', '<f8')])
Tai
  • 7,684
  • 3
  • 29
  • 49