Python powerset - memory error when unpacking the itertools.chain object into a list

Question

Hi I am fairly new to python. I am trying to generate a powerset of all combinations for a list of integers, using the recommended code:

def powerset(iterable):
    s = list(iterable)
    return chain.from_iterable(combinations(s, r) for r in range(len(s)+1))

My list of integers comes in a numpy array from a pandas dataframe. Each int32 integer costs 48 bytes (not quite sure why so much). Thus, as the list of integers increases, it starts placing significant demands on RAM (e.g. 24 integers ==> at some point the list is about 800 Mb in size)

Is there a way around it? How would one manage the memory efficiently, if say you wanted to generate a powerset of 50 integers or more?

Thank you for any answers / pointers in advance.

Note, Python objects are relatively heavy weight. But *even* assuming it cost 1 byte (it doesn't) the powerset of a set of size 50 has size (2**50), which is `(3**50)*1e-9` -> 1.6^6 *gigabytes*. So even if you don't materialize the list an lazily just processing something that large is going to take a longggg time. — juanpa.arrivillaga, Jan 30 '21 at 22:18
So that's 1.6 *petabytes* and again, we know it doesn't cost 1 bytes, actually costs *8 bytes per pointer to the object, plus 48 bytes for the object*, so it really costs 56 bytes. Even at 1.6 petabytes you are in the realm of truly big data, you'd need some sort of cluste of machines with some distributed computing approach. — juanpa.arrivillaga, Jan 30 '21 at 22:27
This isn't just a *memory* issue. I do not think you are really grasping the magnitude of what you want. Imagine the processing of each element took 1 nanosecond... and almost certainly takes orders of magnitude more, then to process it serially without some sort of giant cluster/distributed computing approach would take [*over 20 million years*](https://www.wolframalpha.com/input/?i=3**50+nanoseconds) — juanpa.arrivillaga, Feb 01 '21 at 17:33
Thx, I am fully cognizant of the memory issue. However, ultimately I need to work out a function that relies on the powerset of the integers as x-axis, and the corresponding probabilities as y-axis. I guess, I can either decompose the power set into smaller chunks that can then be assembled up into the full powerset using cartesian products from a file. I wonder what that does to speed of the routine though — TvelA, Feb 01 '21 at 17:38
I wonder how the statistical function is then estimated without using brute force — TvelA, Feb 01 '21 at 17:39
That's a good question. Probably what you should be looking in to. There are other more math/statistics oriented stack exchange networks that may be more helpful in taht regard. — juanpa.arrivillaga, Feb 01 '21 at 17:40
So at a high level are you trying in some vague sense to construct a function P that, given a subset S' of S, will return the probability that S' will occur? How do you establish the probabilities in the first place? — Tim Boddy, Feb 01 '21 at 22:40
individual outcome probabilities are known / part of the input in my model — TvelA, Feb 02 '21 at 16:26
Can you clarify a bit more what the inputs look like (including the probabilities) and what your function takes as input and must supply as output? — Tim Boddy, Feb 02 '21 at 19:29

Python powerset - memory error when unpacking the itertools.chain object into a list

0 Answers0