Fastest way to read every n-th row with numpy's genfromtxt

Question

I read my data with numpy's genfromtxt:

import numpy as np
measurement = np.genfromtxt('measurementProfile2.txt', delimiter=None, dtype=None, skip_header=4, skip_footer=2, usecols=(3,0,2))
rows, columns = np.shape(measurement)
x=np.zeros((rows, 1), dtype=measurement.dtype)
x[:]=394
measurement = np.hstack((measurement, x))
np.savetxt('measurementProfileFormatted.txt',measurement)

this works fine. But i want only ever 5-th, 6-th (so n-th) row in the final Output file. According to numpy.genfromtxt.html there is no Parameter which would do that. I dont want to iterate the array. Is there a recommended way to deal with this problem?

Alex Riley · Accepted Answer · 2015-01-15T12:28:33.010

4

To avoid reading the whole array you can combine np.genfromtxt with itertools.islice to skip the rows. This is marginally faster than reading the whole array and then slicing (at least for the smaller arrays I tried).

For instance, here's the contents of file.txt:

Then for example:

>>> import itertools
>>> with open('file.txt') as f_in:
        x = np.genfromtxt(itertools.islice(f_in, 0, None, 3), dtype=int)

returns an array x with the 0, 3 and 6 indexed elements of the above file:

array([12, 17, 62])

edited Jan 15 '15 at 12:28

answered Jan 15 '15 at 12:02

Alex Riley

169,130
45
262
238

i like this one better than the one of @elyase. I feel it's more pythonic. – user69453 Jan 15 '15 at 12:45
Yep, this is the right solution. I thought about it but assumed it would be slower without testing. – elyase Jan 15 '15 at 14:54
`genfromtxt` accepts anything that feeds it lines - a file, list of lines, generator, etc. There are earlier SO questions that do this - pass the file through a line filter. – hpaulj Jan 15 '15 at 17:51
Many people suggest just slicing after reading, but in my case it was definitely the reading that was slow. This helped immensely! – delrocco Jun 22 '18 at 20:27

score 0 · Answer 2 · answered Jan 15 '15 at 11:19

0

You must read the whole file anyways, to select the n-th element do something like:

>>> a = np.arange(50)
>>> a[::5]
array([ 0,  5, 10, 15, 20, 25, 30, 35, 40, 45])

answered Jan 15 '15 at 11:19

elyase

39,479
12
112
119

score 0 · Answer 3 · answered Jan 15 '15 at 14:25

0

If you just want specific rows in the final output file then why not save only those rows instead of saving the whole 'measurement' matrix:

output_rows = [5,7,11]
np.savetxt('measurementProfileFormatted.txt',measurement[output_rows,:])

answered Jan 15 '15 at 14:25

bitspersecond

148
1
1
7

Fastest way to read every n-th row with numpy's genfromtxt

3 Answers3

Linked