1

I have two lists and I am trying to create one big list with them. The first list just gives me all the possible number of children each parent can have. Think of it as labels.

num_of_children = [0, 1, 2, 3, 4, 5]

The second list gives me how many parents have how many children. For example, 27 parents have 0 children, 22 of them have 1, and so on.

number_of_parents = [27, 22, 30, 12, 7, 2]

Using these two lists, I am trying to get a list that looks like this:

parent_num_of_children = [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 
3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 5, 5]

So far I was able to do this with:

for number in num_of_children:
    parent_num_of_children.extend([number] * number_of_parents[number])

My question is: Is there another way to get this list without a for loop, just using something like the range function or another clever way?

Thanks for your answers!

lyonserdar
  • 41
  • 6
  • Is `num_of_children` always just the sorted integers `0` to `len(number_of_parents) - 1`? Or is that just a bad example and code? – superb rain Jan 22 '21 at 16:01

4 Answers4

6

With some itertools:

list(chain.from_iterable(map(repeat, num_of_children, number_of_parents)))

Benchmark:

0.23 s  0.35 s  0.33 s  original
0.67 s  0.64 s  0.72 s  Bram_Vanroy
1.36 s  1.48 s  1.52 s  Fredericka
0.29 s  0.35 s  0.34 s  superb_rain

See more benchmarks at the end.

Code:

import timeit
from itertools import chain, repeat

def original(num_of_children, number_of_parents):
    parent_num_of_children = []
    for number in num_of_children:
        parent_num_of_children.extend([number] * number_of_parents[number])
    return parent_num_of_children

def Bram_Vanroy(num_of_children, number_of_parents):
    return [c for c, p in zip(num_of_children,number_of_parents) for _ in range(p)]

def Fredericka(num_of_children, number_of_parents):
    parent_num_of_children = []
    for i in range(len(number_of_parents)):
        for n in range(number_of_parents[i]):
            parent_num_of_children.append(num_of_children[i])
    return parent_num_of_children

def superb_rain(num_of_children, number_of_parents):
    return list(chain.from_iterable(map(repeat, num_of_children, number_of_parents)))

funcs = original, Bram_Vanroy, Fredericka, superb_rain
num_of_children = [0, 1, 2, 3, 4, 5]
number_of_parents = [27, 22, 30, 12, 7, 2]

# Correctness
expect = original(num_of_children, number_of_parents)
for func in funcs:
    result = func(num_of_children, number_of_parents)
    print(result == expect, func.__name__)
print()

# Speed
tss = [[] for _ in funcs]
for _ in range(4):
    for func, ts in zip(funcs, tss):
        t = min(timeit.repeat(lambda: func(num_of_children, number_of_parents), number=100000))
        ts.append(t)
        print(*('%.2f s ' % t for t in ts[1:]), func.__name__)
    print()

Another benchmark, with the "larger" case num_of_children = [0, 1, 2, 3, 4, 5] * 100 and number_of_parents = [27, 22, 30, 12, 7, 2] * 100 (and number=1000):

0.25 s  0.17 s  0.16 s  original
0.57 s  0.41 s  0.40 s  Bram_Vanroy
1.22 s  1.19 s  1.17 s  Fredericka
0.16 s  0.16 s  0.17 s  superb_rain

Yet another, where I instead increase the values with number_of_parents = [p * 100 for p in number_of_parents] (and again number=1000):

0.09 s  0.09 s  0.09 s  original
0.46 s  0.38 s  0.38 s  Bram_Vanroy
1.27 s  1.56 s  1.22 s  Fredericka
0.07 s  0.07 s  0.09 s  superb_rain

And with the data suggested by @BramVanroy's comment, num_of_children = [i for i in range(100)]; number_of_parents = [random.randint(500,1000) for _ in range(100)] (and number=100):

0.06 s  0.05 s  0.05 s  original
0.27 s  0.25 s  0.25 s  Bram_Vanroy
0.91 s  0.89 s  0.90 s  Fredericka
0.05 s  0.05 s  0.05 s  superb_rain
superb rain
  • 5,300
  • 2
  • 11
  • 25
  • timing this for such a small list might not be very meaningful. – juanpa.arrivillaga Jan 22 '21 at 16:13
  • @juanpa.arrivillaga Perhaps. Not long ago, someone else's quadratic time solution beat my linear time one because it turned out that the OP's data actually was always small. Anyway, suggest something larger and I'll test it :-). I suspect the repeat-in-C ones will still beat the repeat-in-Python ones. – superb rain Jan 22 '21 at 16:17
  • That would be my intuition as well, unless, say, you had something that barely fit in memory without the repetition that would get into swap with the repetition – juanpa.arrivillaga Jan 22 '21 at 16:19
  • You can quite easily test with larger lists, something like `num_of_children = [i for i in range(100)]; number_of_parents = [random.randint(500,1000) for _ in range(100)]`. (Might need to decrease the number of trials in timeit to do this in reasonable time though.) In all the values that I tested the results were the same. As said before, I am quite impressed by the results. If you have any intuition why the results are what they are (why extend is so fast and why chain.from_iterable is even faster), I'd be glad to hear it. +1 – Bram Vanroy Jan 22 '21 at 16:28
  • @juanpa.arrivillaga Sounds like quite some edge case :-). I think it would only affect the `original` solution, right? Added two more benchmarks now, with "larger" cases (not sure which enlarging is realistic for the OP, if any). – superb rain Jan 22 '21 at 16:32
  • @BramVanroy Yeah, I already added two more benchmarks now. Originally I didn't because there are multiple ways to go larger, and I didn't want to pick. Added one with your data as well now. My intuition about the speeds is what I meantioned already: Having the repetitions/extensions done in C is faster than doing it yourself in Python. – superb rain Jan 22 '21 at 16:39
  • 1
    @BramVanroy list repetition, `some_list * n`, is *very* fast. It's basically a C-level loop that pre-allocates the buffer (so no re-sizing) and quickly copies pointers in the raw PyObject pointer array using pointer arithmetic. – juanpa.arrivillaga Jan 22 '21 at 17:08
2

Without the need to extend existing lists (which is relatively slow) (I stand corrected, see superb rain's answer), you can do the following in a list comprehension. Using range avoids the need to flatten the sublists later on.

num_of_children = [0, 1, 2, 3, 4, 5]
number_of_parents = [27, 22, 30, 12, 7, 2]
parent_num_of_children = [c for c, p in zip(num_of_children,number_of_parents) for _ in range(p)]
# [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 5, 5]
Bram Vanroy
  • 27,032
  • 24
  • 137
  • 239
  • Claiming that their extending is relatively slow really needs a benchmark. – superb rain Jan 22 '21 at 15:58
  • @superbrain Extending existing lists in addition to having to create the new lists before extending is bound to be slower than a list comprehension. I would place bets on that but do not have the time to write bench code. Would love to be proven wrong, though. – Bram Vanroy Jan 22 '21 at 16:01
  • @BramVanroy `.extend` is *not* relatively slow. What might slow things down, as you note, is that they create a list *before* extending – juanpa.arrivillaga Jan 22 '21 at 16:12
  • @BramVanroy But your list comprehension creates every element in Python code, wherease the list repetition and list extension do it in C code. See benchmark in my answer now. – superb rain Jan 22 '21 at 16:13
  • @superbrain yeah, list repetition is about the fastest way to do what it does – juanpa.arrivillaga Jan 22 '21 at 16:16
  • My wording was a bit wrong, indeed. I meant to focus on the creating of the list which I always thought to be prohibitively slow. @superbrain I stand corrected. I must say I am very surprised by this out-come. Especially the fact OP code is so much faster than my suggestion. – Bram Vanroy Jan 22 '21 at 16:27
0

Here is a simple solution:

for i in range(len(number_of_parents)):
    for n in range(number_of_parents[i]):
        parent_num_of_children.append(num_of_children[i])
Fredericka
  • 296
  • 1
  • 7
0

This list comprehension does the trick without any additional tools:

parent_num_of_children = [num_of_children[i] for i in range(len(number_of_parents)) for _ in range(number_of_parents[i])]
PApostol
  • 2,152
  • 2
  • 11
  • 21