7

I've got a large iterable, in fact, a large iterable given by:

itertools.permutations(range(10))

I would like to access to the millionth element. I alredy have problem solved in some different ways.

  1. Casting iterable to list and getting 1000000th element:

    return list(permutations(range(10)))[999999]
    
  2. Manually skiping elements till 999999:

    p = permutations(range(10))
    for i in xrange(999999): p.next()
    return p.next()
    
  3. Manually skiping elements v2:

    p = permutations(range(10))
    for i, element in enumerate(p):
        if i == 999999:
            return element
    
  4. Using islice from itertools:

    return islice(permutations(range(10)), 999999, 1000000).next()
    

But I still don't feel like none of them is the python's elegant way to do that. First option is just too expensive, it needs to compute the whole iterable just to access a single element. If I'm not wrong, islice does internally the same computation I just did in method 2, and is almost exactly as 3rd, maybe it has even more redundant operations.

So, I'm just curious, wondering if there is in python some other way to access to a concrete element of an iterable, or at least to skip the first elements, in some more elegant way, or if I just need to use one of the aboves.

Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
Imanol Luengo
  • 15,366
  • 2
  • 49
  • 67

3 Answers3

20

Use the itertools recipe consume to skip n elements:

def consume(iterator, n):
    "Advance the iterator n-steps ahead. If n is none, consume entirely."
    # Use functions that consume iterators at C speed.
    if n is None:
        # feed the entire iterator into a zero-length deque
        collections.deque(iterator, maxlen=0)
    else:
        # advance to the empty slice starting at position n
        next(islice(iterator, n, n), None)

Note the islice() call there; it uses n, n, effectively not returning anything, and the next() function falls back to the default.

Simplified to your example, where you want to skip 999999 elements, then return element 1000000:

return next(islice(permutations(range(10)), 999999, 1000000))

islice() processes the iterator in C, something that Python loops cannot beat.

To illustrate, here are the timings for just 10 repeats of each method:

>>> from itertools import islice, permutations
>>> from timeit import timeit
>>> def list_index():
...     return list(permutations(range(10)))[999999]
... 
>>> def for_loop():
...     p = permutations(range(10))
...     for i in xrange(999999): p.next()
...     return p.next()
... 
>>> def enumerate_loop():
...     p = permutations(range(10))
...     for i, element in enumerate(p):
...         if i == 999999:
...             return element
... 
>>> def islice_next():
...     return next(islice(permutations(range(10)), 999999, 1000000))
... 
>>> timeit('f()', 'from __main__ import list_index as f', number=10)
5.550895929336548
>>> timeit('f()', 'from __main__ import for_loop as f', number=10)
1.6166789531707764
>>> timeit('f()', 'from __main__ import enumerate_loop as f', number=10)
1.2498459815979004
>>> timeit('f()', 'from __main__ import islice_next as f', number=10)
0.18969106674194336

The islice() method is nearly 7 times faster than the next fastest method.

Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
  • That was fast and really detailed good answer. By the way, you also teached me a way to call functions using timeit. Thank you =D – Imanol Luengo May 28 '13 at 20:39
4

Finding the nth permutation may just be an example but if this is actually the problem you are trying to solve then there is a much better way to do this. Instead of skipping the elements of the iterable you can calculate the nth permutation directly. Borrowing the code from another answer here:

import math

def nthperm(li, n):
    li = list(li)
    n -= 1
    s = len(li)
    res = []
    if math.factorial(s) <= n:
        return None
    for x in range(s-1,-1,-1):
        f = math.factorial(x)
        d = n / f
        n -= d * f
        res.append(li[d])
        del(li[d])
    return res

Example and timing comparison:

In [4]: nthperm(range(10), 1000000)
Out[4]: [2, 7, 8, 3, 9, 1, 5, 4, 6, 0]

In [5]: next(islice(permutations(range(10)), 999999, 1000000))
Out[5]: (2, 7, 8, 3, 9, 1, 5, 4, 6, 0)

In [6]: %timeit nthperm(range(10), 1000000)
100000 loops, best of 3: 9.01 us per loop

In [7]: %timeit next(islice(permutations(range(10)), 999999, 1000000))
10 loops, best of 3: 29.5 ms per loop

Same answer, over 3000 times faster. Note that I did make a slight modification to the original code so that it will no longer destroy the original list.

Community
  • 1
  • 1
Andrew Clark
  • 202,379
  • 35
  • 273
  • 306
  • It was not the cap of the question, I was just curious at the faster way to skip elements from an iterable. However, your answer is an interest way to solve the nth factorial even faster. I gave you +1. Thx! – Imanol Luengo May 28 '13 at 20:48
2

It is indeed awefully wasteful to slurp up a million items just to get to the next. Unfortunately, whether it can be avoided depends on your iterator: If the iterator has a way to skip directly to a particular offset, it can implement the __getitem__ method and you can use it to request iterator[1000000] directly. (How it gets there is up to the generating algorithm).

If your data source needs to generate all the prior values in order to get there, how you throw them away is the least of your problems. You can choose a nice way, but it's just icing on the cake.

PS. Given the context of your question I was going to outline an algorithm for generating the n-th permutation directly, but I see @F.J. beat me to it. Nice solution! :-)

alexis
  • 48,685
  • 16
  • 101
  • 161
  • Many processes cannot give you that option; network sockets for example. Or a file with variable-length lines, needing to skip `x` lines. But yes, if you can 'seek' directly to the item you need, then do use that. – Martijn Pieters May 28 '13 at 20:50
  • Exactly, that's what I was trying to get at. What's "inelegant" (read: inefficient) is that there's often no way to get out of generating the skipped items. – alexis May 28 '13 at 20:54