NumPy genfromtxt - skipping rows that start with specific number

Question

I have a file in which the first column has an integer number that tells whether it is data of metadata - lines that start with 0 are metadata, and the number of columns in them is not fixed, while any other integer designate data lines:

   0  -1.0000E+02  2.0000E+03 -1.0000E+03  
   0   NDIM=   3   IPS =   1   
   1     3   0    0  1.8279163801E+001  2.1982510269E-002

I would like to use numpy in order to read the data only from the lines that start with a non-zero integer, can I do it with numpy.genfromtxt() ?

unutbu · Accepted Answer · 2016-02-23T10:38:05.020

np.genfromtxt can accept an iterator as its first argument. So you could build a generator expression to yield just the desired lines:

import re
lines = (line for line in open('data', 'rb')  if re.match(r'^\s*[1-9]', line) )

Then

In [61]: np.genfromtxt(lines)
Out[61]: 
array([  1.        ,   3.        ,   0.        ,   0.        ,
        18.2791638 ,   0.02198251])

re.match(r'^\s*[1-9]', line) tests if the line starts with whitespace followed by a digit between 1 and 9. If the non-zero integers could begin with 0, then you could instead use

lines = (line for line in open('data', 'rb')  if line.split(None, 1)[0].lstrip('0') in '123456789')

NumPy genfromtxt - skipping rows that start with specific number

1 Answers1