How to force numpy.genfromtxt to generate a non-structured numpy array?

Question

In Python 3 I do:

s = StringIO(u"1,1.3,abcde\n2,1.3,test")
data = numpy.genfromtxt(s, dtype=[int,float,'U10'], delimiter=',', names=None)

and I get:

array([(1, 1.3, 'abcde'), (2, 1.3, 'test')],
      dtype=[('f0', '<i8'), ('f1', '<f8'), ('f2', '<U10')])

I would like to get a regular numpy array with no names like the following:

array([[1, 1.3, 'abcde'], 
        [2, 1.3, 'test']])

Is it possible?

"Regular numpy arrays" only have one data type. If you try your last piece of code, all objects will be strings — JBernardo, Oct 20 '19 at 00:45

hpaulj · Answer 1 · 2019-10-20T03:12:03.573

With a text list:

In [338]: txt = '''1, 1.3, abcde 
     ...: 2, 1.3, def'''.splitlines()

The structured array:

In [339]: np.genfromtxt(txt, dtype=None, delimiter=',', encoding=None)          
Out[339]: 
array([(1, 1.3, ' abcde'), (2, 1.3, ' def')],
      dtype=[('f0', '<i8'), ('f1', '<f8'), ('f2', '<U6')])

Trying to specify object - each item its own type:

In [340]: np.genfromtxt(txt, dtype=object, delimiter=',', encoding=None)        
Out[340]: 
array([[b'1', b' 1.3', b' abcde'],
       [b'2', b' 1.3', b' def']], dtype=object)

It doesn't try to convert any strings to numbers.

converters converts columns right, but for some reason still makes a structured array:

In [341]: np.genfromtxt(txt, dtype=object, delimiter=',', encoding=None, convert
     ...: ers={0:int, 1:float})                                                 
Out[341]: 
array([(1, 1.3, b' abcde'), (2, 1.3, b' def')],
      dtype=[('f0', '<i8'), ('f1', '<f8'), ('f2', 'O')])

But you could convert the structured array to object dtype via a list:

In [346]: np.genfromtxt(txt, dtype=None, delimiter=',', encoding=None)          
Out[346]: 
array([(1, 1.3, ' abcde'), (2, 1.3, ' def')],
      dtype=[('f0', '<i8'), ('f1', '<f8'), ('f2', '<U6')])
In [347]: np.array(_.tolist(), object)                                          
Out[347]: 
array([[1, 1.3, ' abcde'],
       [2, 1.3, ' def']], dtype=object)

Another option is to split the lines yourself, building a list of lists. genfromtxt is doing that with few more bells and whistles.

In [357]: lines=[] 
     ...: for line in txt: 
     ...:     i = line.split(',') 
     ...:     x = (int(i[0]), float(i[1]), i[2].strip()) 
     ...:     lines.append(x) 

In [358]: lines                                                                 
Out[358]: [(1, 1.3, 'abcde'), (2, 1.3, 'def')]
In [359]: np.array(lines,object)                                                
Out[359]: 
array([[1, 1.3, 'abcde'],
       [2, 1.3, 'def']], dtype=object)

But beware that you can't do math on that object array as well as on a numeric array, or even the numeric fields of the structured array.

Great and complete answer! Are structured array and record arrays the same thing? — jtlz2, Jul 06 '21 at 14:08

score 0 · Answer 2 · answered Oct 20 '19 at 00:58

What you got is a "structured array" and it is superior to a "regular array" because it supports heterogeneous data types. Two of your columns are numbers but one is text, so it doesn't really make sense to collapse your data into a plain numpy.ndarray without structure. But if you want, you can:

numpy.array(data.tolist())

That will give you an ndarray with all strings:

array([['1', '1.3', 'abcde'],
       ['2', '1.3', 'test']], dtype='<U32')

But this is rarely a good idea. If we had more context, we might be able to suggest a better overall approach.

How to force numpy.genfromtxt to generate a non-structured numpy array?

2 Answers2