Equivalent of pandas read_sql_query for numpy array?

Question

I would like to use something equivalent to pandas "read_sql_query", to load the data from the database into a numpy array, not a pandas dataframe.

In pandas I use something like:

df = pd.read_sql_query(sql="select * from db;", con=con, index_col='index')

And now I would need a function like:

A = read_sql_to_np(sql="select * from db;")

where A is a numpy array.

does this help? [What's the most efficient way to convert a MySQL result set to a NumPy array?](https://stackoverflow.com/questions/7061824/whats-the-most-efficient-way-to-convert-a-mysql-result-set-to-a-numpy-array) — anky, Jan 28 '20 at 15:02

senderle · Accepted Answer · 2020-01-28T23:04:06.007

3

Any dataframe can be converted into a numpy array using the to_array() method:

>>> df = pandas.DataFrame({'A': [1, 2, 3], 
                           'B': [1.0, 2.0, 3.0], 
                           'C': ['a', 'b', 'c']})
>>> df.to_numpy()
array([[1, 1.0, 'a'],
       [2, 2.0, 'b'],
       [3, 3.0, 'c']], dtype=object)
>>> df['A'].to_numpy()
array([1, 2, 3])
>>> df[['A', 'B']].to_numpy()
array([[1., 1.],
       [2., 2.],
       [3., 3.]])
>>> df[['C']].to_numpy()
array([['a'],
       ['b'],
       ['c']], dtype=object)

So you can simply use pandas and then extract the numpy array from the resulting dataframe.

As Parfait points out, you have to be careful about data types when doing the conversion. I left that implicit in the example above, but notice how the first example generates an array with dtype=object, whereas the second generates an ordinary floating point array. I think a detailed discussion of data types in numpy is beyond the scope of this question though.

edited Jan 28 '20 at 23:04

answered Jan 28 '20 at 15:02

senderle

145,869
36
209
233

2

Let me know if there is some reason you need to avoid pandas entirely. But in the vast majority of situations I would recommend this over rolling your own SQL-to-numpy converter. – senderle Jan 28 '20 at 15:07
1

@Henry, Remember too a database table most likely resembles a pandas dataframe (columns of varying types) than a numpy array (columns and rows of same type). So naturally, a DB API would be set up for pandas and not numpy. – Parfait Jan 28 '20 at 15:14
1

BTW - if your database table does resemble a matrix (all types the same), you may not be exercising database normalization! – Parfait Jan 28 '20 at 15:16

Equivalent of pandas read_sql_query for numpy array?

1 Answers1