Any dataframe can be converted into a numpy
array using the to_array()
method:
>>> df = pandas.DataFrame({'A': [1, 2, 3],
'B': [1.0, 2.0, 3.0],
'C': ['a', 'b', 'c']})
>>> df.to_numpy()
array([[1, 1.0, 'a'],
[2, 2.0, 'b'],
[3, 3.0, 'c']], dtype=object)
>>> df['A'].to_numpy()
array([1, 2, 3])
>>> df[['A', 'B']].to_numpy()
array([[1., 1.],
[2., 2.],
[3., 3.]])
>>> df[['C']].to_numpy()
array([['a'],
['b'],
['c']], dtype=object)
So you can simply use pandas
and then extract the numpy array from the resulting dataframe.
As Parfait points out, you have to be careful about data types when doing the conversion. I left that implicit in the example above, but notice how the first example generates an array with dtype=object
, whereas the second generates an ordinary floating point array. I think a detailed discussion of data types in numpy
is beyond the scope of this question though.