I haven't used recfromcsv
, but looking at its code I see it uses np.genfromtxt
, followed by a masked records construction.
I'd suggest giving a small sample csv
text (3 or so lines), and show the resulting data
. We need to see the dtype
in particular.
It may also be useful to start with genfromtxt
, skipping the masked array stuff for now. I don't think that's where the sticky point is in converting dtypes in structured arrays.
In any case, we need something more concrete to explore.
You can't change the dtype
of structured fields in-place. You have to make a new array with a new dtype, and copy values from the old to the new.
import numpy.lib.recfunctions as rf
has some functions that can help in changing structured arrays.
===========
I suspect that it will be simpler to spell out the dtypes
when calling genfromtxt
than to change dtypes in an existing array.
You could try one read with the dtype=None
and limited number of lines to get the column count and base dtype
. Then edit that, substituting floats for ints as needed. Now read the whole with the new dtype. Look in the recfunctions
code if you need ideas on how to edit dtypes.
For example:
In [504]: txt=b"""a, 1, 2, 4\nb, 6, 9, 10\nc, 4, 4, 3"""
In [506]: arr = np.genfromtxt(txt.splitlines(), dtype=None, delimiter=',')
In [507]: arr
Out[507]:
array([(b'a', 1, 2, 4), (b'b', 6, 9, 10), (b'c', 4, 4, 3)],
dtype=[('f0', 'S1'), ('f1', '<i4'), ('f2', '<i4'), ('f3', '<i4')])
In [508]: arr.dtype.descr
Out[508]: [('f0', '|S1'), ('f1', '<i4'), ('f2', '<i4'), ('f3', '<i4')]
A crude dtype editor:
def foo(tup):
name, dtype=tup
dtype = dtype.replace('S','U')
dtype = dtype.replace('i','f')
return name, dtype
And applying this to default dtype:
In [511]: dt = [foo(tup) for tup in arr.dtype.descr]
In [512]: dt
Out[512]: [('f0', '|U1'), ('f1', '<f4'), ('f2', '<f4'), ('f3', '<f4')]
In [513]: arr = np.genfromtxt(txt.splitlines(), dtype=dt, delimiter=',')
In [514]: arr
Out[514]:
array([('a', 1.0, 2.0, 4.0), ('b', 6.0, 9.0, 10.0), ('c', 4.0, 4.0, 3.0)],
dtype=[('f0', '<U1'), ('f1', '<f4'), ('f2', '<f4'), ('f3', '<f4')])
In [522]: arr = np.recfromcsv(txt.splitlines(), dtype=dt, delimiter=',',case_sensitive=True,usemask=True,names=None)
In [523]: arr
Out[523]:
masked_records(
f0 : ['a' 'b' 'c']
f1 : [1.0 6.0 4.0]
f2 : [2.0 9.0 4.0]
f3 : [4.0 10.0 3.0]
fill_value : ('N', 1.0000000200408773e+20, 1.0000000200408773e+20, 1.0000000200408773e+20)
)
=====================
astype
works if the target dtype matches. For example if I read the txt
with dtype=None, and then use the derived dt
, it works:
In [530]: arr = np.genfromtxt(txt.splitlines(), delimiter=',',dtype=None)
In [531]: arr
Out[531]:
array([(b'a', 1, 2, 4), (b'b', 6, 9, 10), (b'c', 4, 4, 3)],
dtype=[('f0', 'S1'), ('f1', '<i4'), ('f2', '<i4'), ('f3', '<i4')])
In [532]: arr.astype(dt)
Out[532]:
array([('a', 1.0, 2.0, 4.0), ('b', 6.0, 9.0, 10.0), ('c', 4.0, 4.0, 3.0)],
dtype=[('f0', '<U1'), ('f1', '<f4'), ('f2', '<f4'), ('f3', '<f4')])
Same for arr.astype('U3,int,float,int')
which also has 4 compatible fields.