1

I would like to use something equivalent to pandas "read_sql_query", to load the data from the database into a numpy array, not a pandas dataframe.

In pandas I use something like:

df = pd.read_sql_query(sql="select * from db;", con=con, index_col='index')

And now I would need a function like:

A = read_sql_to_np(sql="select * from db;")

where A is a numpy array.

Henry
  • 93
  • 1
  • 10
  • 1
    does this help? [What's the most efficient way to convert a MySQL result set to a NumPy array?](https://stackoverflow.com/questions/7061824/whats-the-most-efficient-way-to-convert-a-mysql-result-set-to-a-numpy-array) – anky Jan 28 '20 at 15:02

1 Answers1

3

Any dataframe can be converted into a numpy array using the to_array() method:

>>> df = pandas.DataFrame({'A': [1, 2, 3], 
                           'B': [1.0, 2.0, 3.0], 
                           'C': ['a', 'b', 'c']})
>>> df.to_numpy()
array([[1, 1.0, 'a'],
       [2, 2.0, 'b'],
       [3, 3.0, 'c']], dtype=object)
>>> df['A'].to_numpy()
array([1, 2, 3])
>>> df[['A', 'B']].to_numpy()
array([[1., 1.],
       [2., 2.],
       [3., 3.]])
>>> df[['C']].to_numpy()
array([['a'],
       ['b'],
       ['c']], dtype=object)

So you can simply use pandas and then extract the numpy array from the resulting dataframe.

As Parfait points out, you have to be careful about data types when doing the conversion. I left that implicit in the example above, but notice how the first example generates an array with dtype=object, whereas the second generates an ordinary floating point array. I think a detailed discussion of data types in numpy is beyond the scope of this question though.

senderle
  • 145,869
  • 36
  • 209
  • 233
  • 2
    Let me know if there is some reason you need to avoid pandas entirely. But in the vast majority of situations I would recommend this over rolling your own SQL-to-numpy converter. – senderle Jan 28 '20 at 15:07
  • 1
    @Henry, Remember too a database table most likely resembles a pandas dataframe (columns of varying types) than a numpy array (columns and rows of same type). So naturally, a DB API would be set up for pandas and not numpy. – Parfait Jan 28 '20 at 15:14
  • 1
    BTW - if your database table does resemble a matrix (all types the same), you may not be exercising database normalization! – Parfait Jan 28 '20 at 15:16