269

Given an iterator user_iterator, how can I iterate over the iterator a list of the yielded objects?

I have this code, which seems to work:

user_list = [user for user in user_iterator]

But is there something faster, better or more correct?

Karl Knechtel
  • 62,466
  • 11
  • 102
  • 153
systempuntoout
  • 71,966
  • 47
  • 171
  • 241
  • 3
    Before optimizing this, be sure you've done some profiling to prove that this really is the bottleneck. – S.Lott Sep 24 '10 at 20:52
  • 2
    @S.Lott. I normally agree with that attitude but, in this case, it very much should be optimized stylistically which, as is so often the case with Python, will optimize it for speed as well. – aaronasterling Sep 24 '10 at 21:10
  • 6
    The OP said nothing about having a bottleneck. It's a perfectly fine general question with a simple answer, it doesn't need to depend on a specific application that can be run through a profiler. – Ken Williams Mar 19 '17 at 03:49
  • 5
    The most compact way is `[*iterator]`. – Challenger5 Mar 27 '17 at 06:04

3 Answers3

467
list(your_iterator)
mikerobi
  • 20,527
  • 5
  • 46
  • 42
  • 11
    Actually, almost always quite a bit faster. Also, much more obvious. – Thomas Wouters Sep 24 '10 at 20:52
  • 10
    @systempuntoout It runs entirely in C. The list comprehension is in python. Of course it runs faster. – aaronasterling Sep 24 '10 at 22:24
  • 4
    I still totally hate that there is no better way in python. It's tedious to have to edit both sides of an expression only to be able to slice or index it. (very common in python3, if it's a pure expression like zip, or map with a pure function) – Jo So Oct 24 '15 at 05:29
  • Hmm. `import matplotlib.pyplot as plt' followed by `ax = plt.gca()` and `list(ax._get_lines.prop_cycler)` results in an infinite loop. Is there an elegant way to handle this? – Jens Munk Sep 05 '17 at 19:02
  • 7
    In my fast testing, `[*your_iterator]` appeared to be about twice as fast as `list(your_iterator)`. Is this generally true, or it was just a specific occassion? (I used a `map` as iterator.) – Neinstein Sep 10 '18 at 21:49
  • @Neinstein Twice as fast is extremely exaggerated. It makes a difference of 0.1 seconds per 1000000 iterations. – Bachsau Sep 25 '19 at 11:20
  • @JoSo How much better can it get? This already is the cleanest and most obvious solution: Typecasting by calling the constructor of another class. – Bachsau Sep 25 '19 at 11:22
  • 2
    @Bachsau: Admittedly it's pretty good, but compare to Bash scripting where you can manipulate the current output by appending a pipe and another filter command strictly to the right of the current command. It sucks that for such a minor distinction (iterator vs materialized list) you often have to move the cursor back. – Jo So Sep 25 '19 at 13:19
  • If `your_iterator = reversed([1,2,3])`, then I'm getting `Error in argument: '(your_iterator)'`. Am I missing something? – Teepeemm May 06 '22 at 18:45
40

@Robino was suggesting to add some tests which make sense, so here is a simple benchmark between 3 possible ways (maybe the most used ones) to convert an iterator to a list:

  1. by type constructor

    list(my_iterator)

  2. by unpacking

    [*my_iterator]

  3. using list comprehension

    [e for e in my_iterator]

I have been using simple_bechmark library:

from simple_benchmark import BenchmarkBuilder
from heapq import nsmallest

b = BenchmarkBuilder()

@b.add_function()
def convert_by_type_constructor(size):
    list(iter(range(size)))

@b.add_function()
def convert_by_list_comprehension(size):
    [e for e in iter(range(size))]

@b.add_function()
def convert_by_unpacking(size):
    [*iter(range(size))]


@b.add_arguments('Convert an iterator to a list')
def argument_provider():
    for exp in range(2, 22):
        size = 2**exp
        yield size, size

r = b.run()
r.plot()

enter image description here

As you can see there is very hard to make a difference between conversion by the constructor and conversion by unpacking, conversion by list comprehension is the “slowest” approach.


I have been testing also across different Python versions (3.6, 3.7, 3.8, 3.9) by using the following simple script:

import argparse
import timeit

parser = argparse.ArgumentParser(
    description='Test convert iterator to list')
parser.add_argument(
    '--size', help='The number of elements from iterator')

args = parser.parse_args()

size = int(args.size)
repeat_number = 10000

# do not wait too much if the size is too big
if size > 10000:
    repeat_number = 100


def test_convert_by_type_constructor():
    list(iter(range(size)))


def test_convert_by_list_comprehension():
    [e for e in iter(range(size))]


def test_convert_by_unpacking():
    [*iter(range(size))]


def get_avg_time_in_ms(func):
    avg_time = timeit.timeit(func, number=repeat_number) * 1000 / repeat_number
    return round(avg_time, 6)


funcs = [test_convert_by_type_constructor,
         test_convert_by_unpacking, test_convert_by_list_comprehension]

print(*map(get_avg_time_in_ms, funcs))

The script will be executed via a subprocess from a Jupyter Notebook (or a script), the size parameter will be passed through command-line arguments and the script results will be taken from standard output.

from subprocess import PIPE, run

import pandas

simple_data = {'constructor': [], 'unpacking': [], 'comprehension': [],
        'size': [], 'python version': []}


size_test = 100, 1000, 10_000, 100_000, 1_000_000
for version in ['3.6', '3.7', '3.8', '3.9']:
    print('test for python', version)
    for size in size_test:
        command = [f'python{version}', 'perf_test_convert_iterator.py', f'--size={size}']
        result = run(command, stdout=PIPE, stderr=PIPE, universal_newlines=True)
        constructor, unpacking,  comprehension = result.stdout.split()
        
        simple_data['constructor'].append(float(constructor))
        simple_data['unpacking'].append(float(unpacking))
        simple_data['comprehension'].append(float(comprehension))
        simple_data['python version'].append(version)
        simple_data['size'].append(size)

df_ = pandas.DataFrame(simple_data)
df_

enter image description here

You can get my full notebook from here.

In most of the cases, in my tests, unpacking shows to be faster, but the difference is so small that the results may change from a run to the other. Again, the comprehension approach is the slowest, in fact, the other 2 methods are up to ~ 60% faster.

Neuron
  • 5,141
  • 5
  • 38
  • 59
kederrac
  • 16,819
  • 6
  • 32
  • 55
37

since python 3.5 you can use * iterable unpacking operator:

user_list = [*your_iterator]

but the pythonic way to do it is:

user_list  = list(your_iterator)
kederrac
  • 16,819
  • 6
  • 32
  • 55