Fast iterating over first n items of an iterable (not a list) in python

Question

I'm looking for a pythonic way of iterating over first n items of an iterable (upd: not a list in a common case, as for lists things are trivial), and it's quite important to do this as fast as possible. This is how I do it now:

count = 0
for item in iterable:
 do_something(item)
 count += 1
 if count >= n: break

Doesn't seem neat to me. Another way of doing this is:

for item in itertools.islice(iterable, n):
    do_something(item)

This looks good, the question is it fast enough to use with some generator(s)? For example:

pair_generator = lambda iterable: itertools.izip(*[iter(iterable)]*2)
for item in itertools.islice(pair_generator(iterable), n):
 so_something(item)

Will it run fast enough as compared to the first method? Is there some easier way to do it?

The only way to answer "fast enough" is to benchmark it yourself. — Glenn Maynard, Apr 23 '10 at 21:51
See also: http://stackoverflow.com/questions/2688079/how-to-iterate-over-the-first-n-elements-of-a-list — outis, Apr 23 '10 at 21:57
Why is it "quite important to do this as fast as possible"? Can you justify this with pstats results for a realistic use case? I suspect that your solution with `islice` will actually prove the best reasonable solution performance-wise, but of course we don't know without timing. — Mike Graham, Apr 23 '10 at 22:02

Mike Graham · Accepted Answer · 2010-04-23T22:08:32.600

for item in itertools.islice(iterable, n): is the most obvious, easy way to do it. It works for arbitrary iterables and is O(n), like would be any sane solution.

It's conceivable that another solution could have better performance; we wouldn't know without timing. I wouldn't recommend bothering with timing unless you profile your code and find this call to be a hotspot. Unless it's buries within an inner loop, it is highly doubtful that it will be. Premature optimization is the root of all evil.

If I was going to look for alternate solutions, I would look at ones like for count, item in enumerate(iterable): if count > n: break ... and for i in xrange(n): item = next(iterator) .... I wouldn't guess these would help, but they seem to be worth trying if we really want to compare things. If I was stuck in a situation where I profiled and found this was a hotspot in an inner loop (is this really your situation?), I would also try to ease the name lookup from getting the islice attribute of the global iterools to binding the function to a local name already.

These are things you only do after you've proven they'll help. People try doing them other times a lot. It doens't help make their programs appreciably faster; it just makes their programs worse.

Well, uning enumerate looks quite good to me too! As for profiling and finding hotspots, this is not actually my case, I just expect some loops in my code to have enormous iteration counts, that is why I've asked a question. Now I get it - this was a mistake to try optimization on this stage, I've got to finish the code and test it, and only then optimize things, if needed. Thanks again for your help. — martinthenext, Apr 23 '10 at 22:27

score 6 · Answer 2 · answered Apr 23 '10 at 22:03

itertools tends to be the fastest solution, when directly applicable.

Obviously, the only way to check is to benchmark -- e.g., save in aaa.py

import itertools

def doit1(iterable, n, do_something=lambda x: None):
  count = 0
  for item in iterable:
   do_something(item)
   count += 1
   if count >= n: break

def doit2(iterable, n, do_something=lambda x: None):
  for item in itertools.islice(iterable, n):
      do_something(item)

pair_generator = lambda iterable: itertools.izip(*[iter(iterable)]*2)

def dd1(itrbl=range(44)): doit1(itrbl, 23)
def dd2(itrbl=range(44)): doit2(itrbl, 23)

and see...:

$ python -mtimeit -s'import aaa' 'aaa.dd1()'
100000 loops, best of 3: 8.82 usec per loop
$ python -mtimeit -s'import aaa' 'aaa.dd2()'
100000 loops, best of 3: 6.33 usec per loop

so clearly, itertools is faster here -- benchmark with your own data to verify.

BTW, I find timeit MUCH more usable from the command line, so that's how I always use it -- it then runs the right "order of magnitude" of loops for the kind of speeds you're specifically trying to measure, be those 10, 100, 1000, and so on -- here, to distinguish a microsecond and a half of difference, a hundred thousand loops is about right.

weird, it's just against my cplusplus'ish intuition to see a simple solution run slower than a neat one. python is the coolest language, indeed. this is a great addition to Mike Graham's advice not to do premature optimization. i guess the general rule is to write what's neat, not thinking about running time. — martinthenext, Apr 23 '10 at 22:15
@martin, personally, I think a _lot_ about running time (mostly in terms of big-O for scalability) -- but, in general, the most Pythonic idioms are often going to be the ones that have been most optimized, because they're usually the ones us Python committers tend to care most about (Hettinger, the author of itertools and other speedy parts of Python, has been quite active in that field in recent years, as was Peters in earlier years, but it's a pretty general phenomenon in the Python-committer community). — Alex Martelli, Apr 24 '10 at 02:26

score 2 · Answer 3 · answered Apr 23 '10 at 21:50

2

If it's a list then you can use slicing:

list[:n]

answered Apr 23 '10 at 21:50

Mark Byers

811,555
193
1,581
1,452

score 2 · Answer 4 · answered Apr 23 '10 at 22:12

2

You can use enumerate to write essentially the same loop you have, but in a more simple, Pythonic way:

for idx, val in enumerate(iterableobj):
    if idx > n:
        break
    do_something(val)

answered Apr 23 '10 at 22:12

Michael Aaron Safyan

93,612
16
138
200

This option has been discussed above, looks like a good option, but I think `islice` is better because it doesn't require any additional variables in the loop body, making it look clearer to me – martinthenext Apr 23 '10 at 22:37

score 1 · Answer 5 · answered Apr 23 '10 at 21:49

1

Of a list? Try

for k in mylist[0:n]:
     # do stuff with k

you can also use a comprehension if you need to

my_new_list = [blah(k) for k in mylist[0:n]]

answered Apr 23 '10 at 21:49

Escualo

40,844
23
87
135

Fast iterating over first n items of an iterable (not a list) in python

5 Answers5

Linked

Related