Your format is fighting a couple of assumptions that genfromtxt
is making:
1) you have both comment lines and a header line (without # character)
2) your column names have spaces, which genfromtxt
insists on converting to _
(or some other valid character).
If I create a text file from your sample, and replace blanks with tabs (which is a pain, especially since my editors are set to replace tabs with spaces), this works:
In [330]: np.genfromtxt('stack29451030.txt',delimiter='\t',dtype=None,skip_header=3,names=True)
Out[330]:
array([(2, 8, 14, 748, 748, 748, 790), (2, 9, 22, 262, 245, 252, 328)],
dtype=[('p', '<i4'), ('q', '<i4'), ('r', '<i4'), ('y_1', '<i4'), ('y_2', '<i4'), ('y_3', '<i4'), ('y_4', '<i4')])
I played with replace_space=' '
. Looks like it only uses replacements that produce valid Python variable and attribute names. So 'y_1'
is fine, but not 'y 1
'. I don't see way around this using parameters.
comments
and names
don't cooperate in your case. It can skip the comment lines, but then will read the names line as data.
In [350]: np.genfromtxt('stack29451030.txt',delimiter='\t',dtype=None,comments='#')
Out[350]:
array([['p', 'q', 'r', 'y 1', 'y 2', 'y 3', 'y 4'],
['2', '8', '14', '748', '748', '748', '790'],
['2', '9', '22', '262', '245', '252', '328']],
dtype='|S3')
It can handle a names line like #p q r y1 y2 y3 y4
, ignoring the #, but then it doesn't skip the earlier comments lines. So if you could remove the comment lines, or the header line, it could read it. But with both it looks like you have to use something other than comments
.
This looks like the cleanest load - explicitly skip 1st 3 lines, accept the header line, and then use jedwards's
idea to replace the _
.
In [396]: A=np.genfromtxt('stack29451030.txt',delimiter='\t',dtype=None,skip_header=3,names=True)
In [397]: A.dtype.names = [n.replace('_', ' ') for n in A.dtype.names]
In [398]: A
Out[398]:
array([(2, 8, 14, 748, 748, 748, 790), (2, 9, 22, 262, 245, 252, 328)],
dtype=[('p', '<i4'), ('q', '<i4'), ('r', '<i4'), ('y 1', '<i4'), ('y 2', '<i4'), ('y 3', '<i4'), ('y 4', '<i4')])
If you don't know how many comment lines there are, this generator can filter them out:
with open('stack29451030.txt') as f:
g = (line for line in f if not line.startswith('#'))
A = np.genfromtxt(g, delimiter='\t', names=True, dtype=None)
genfromtxt
accepts input from any iterable, whether a file, a list of lines, or a generator like this.