0

Edit::
after all these discussions with juanpa & fusion here in the comments and Kevin on python chat , i have come to a conclusion that iterating through a generator takes the same time as it would take iterating through any other object because generator itself generates those combinations on the fly. Moreover the approach by fusion worked great for len(arr) up to 1000(maybe up to 5k - but it terminates due to time out, of course on an online judge - Please Note it is not because of trying to get the min_variance_sub, but I also have to get the sum of absolute differences of all the pairs possible in the min_variance_sub). I am going to accept fusion's approach as an answer for this question, because it answered the question. But I will also create a new question for that problem statement (more like a QnA, where I will also answer the question for future visitors - i got the answer from submissions by other candidates, an editorial by problem setter, and a code by problem setter himself - though I do not understand the approach they used). I will link to the other question as I create it :)
It's HERE

The original question starts below

I'm using itertools.combinations on an array so first up I tried something like

aList = [list(x) for x in list(cmb(arr, k))]

where cmb = itertools.combinations, arr is the list, and k is an int. This works totally good for len(arr) < 20 or so but this Raised a MemoryError when len(arr) became 50 or more.

On a suggestion by kevin on Python Chat, I used a generator, and it worked amazingly fast in generating those combinations like this

aGen = (list(x) for x in cmb(arr, k))

But It's so slow to iterate through this generator object. I tried something like

for p in aGen:
    continue

and even this code seems to take forever.

Kevin also suggested an answer talking about kth combination which was nice but in my case I actually want to test all the possible combinations and select the one with minimum variance.

So what would be the memory efficient way of checking all the possible combinations of an array (a list) to have minimum variance (to be precise, I only need to consider sub arrays having exactly k number of elements)

Thank You For Any Help.

P S Solanki
  • 1,033
  • 2
  • 11
  • 26
  • 2
    What is k here? Almost certainly, you are just working with very many combinations. – juanpa.arrivillaga Sep 02 '20 at 04:52
  • 2
    Note, `aGen = (list(x) for x in cmb(arr, k))` doesn't generate the combinations, it creates *a generator* which generates the combinations on the fly as you iterate over it. So of course it's very fast, it doesn't really do any work – juanpa.arrivillaga Sep 02 '20 at 04:55
  • in the current case len(arr) is 50, and k is 8. And yes the number of combinations is definitely a lot. – P S Solanki Sep 02 '20 at 05:29
  • @juanpa I see. so is it more or less like the kth combination thingy (of course without actually having indexed combinations) ? – P S Solanki Sep 02 '20 at 05:31
  • 3
    50 choose 8 is 536,878,650. Half a billion iterations. Assuming the work you do on each iteration takes, say, 1 millisecond then it would require `536878650 * 1e-3 / (60*60) == 149.13295833333333` hours to complete. Now, perhaps the work you are doing on each iteration is less, but that gives you a good idea how long this could potentially take. What operation are you doing with each combination? – juanpa.arrivillaga Sep 02 '20 at 05:49
  • :O I will be doing some heavy task on each iteration. To be precise, check the `variance `of each sub array (or a sub list) and select the one with the `minimum variance` (i will use `statistics.variance()` to calculate variance because the naive approach will only add more operations resulting in disastrous time complexity). – P S Solanki Sep 02 '20 at 06:14
  • 1
    Actually, the naive-approach might be better, the `statistics` package has to handle various different numeric typs, and it takes great care, so there's a lot of overhead. I don't think the time complexity would be different in any case, but of course, here constant factors matter – juanpa.arrivillaga Sep 02 '20 at 06:16
  • I would consider that. But the thing is the program doesn't even reach anywhere near the point where I actually make use of either the naive approach or `statistics`. It can't get past `for p in aGen: continue ` block of code. And unfortunately I am supposed to this whole process in under a few seconds (It is actually one of the hard algorithmic challenges on `hackerrank.com`). I am starting to think Python is not really suitable for those tasks (or it may roughly be the same case with other langs.) Is there a way around this or it has to to take hours to achieve what I intend? – P S Solanki Sep 02 '20 at 06:30
  • Trust me when I say there are test cases with len(arr) beyond 10^5 for this problem. – P S Solanki Sep 02 '20 at 06:33

1 Answers1

2

You can sort the list with n elements first,

Then use a moving window of k length along the sorted list.

And find the minimum variance of the n-k+1 possible combinations.

The minimum should be the minimum of all combinations.

 
def myvar(arr):
    l = len(arr)
    m = sum(arr)/l
    return sum((i-m)**2 for i in arr)/l


input_list = [.......]

sorted_list = sorted(input_list)

variance = None
min_variance_sub = None
for i in range(len(sorted_list) - k + 1):
    sub = sorted_list[i:i+k]
    var = myvar(sub)
    if variance is None or var<variance:
        variance = var
        min_variance_sub=sub
print(min_variance_sub)
fusion
  • 1,327
  • 6
  • 12
  • Is that a way to partition the problem for parallel execution? – Pynchia Sep 02 '20 at 06:49
  • @Pynchia Sure, you can split the `sorted_list` into overlapping chunks with `k` overlap, then use `multiprocessing` to compute minimum variance for each chunk, then combine the results and find the global minimum. – fusion Sep 02 '20 at 06:53
  • OK thanks. Apart from the multiprocessing, please expound with examples, as your answer isn't clear enough, too concise – Pynchia Sep 02 '20 at 06:55
  • @Pynchia Do you want examples for the answer to 'how to find the minimal variance' or 'how to partition the problem for parallel execution'? – fusion Sep 02 '20 at 07:02
  • @fusion I appreciate your response. But the thing is the online judge system will not allow numpy (or any module which is not in the official python distro) so your approach is totally valid but it won't work in my case though. – P S Solanki Sep 02 '20 at 12:00
  • 1
    @PSSolanki In that case, you can write your own variance calculator function. See my edits above. – fusion Sep 02 '20 at 13:42
  • Alright, i'ma give that a go. and will let u know of the outcome. – P S Solanki Sep 02 '20 at 14:52
  • @fusion It worked totally well, but seems like it doesn't check all the possible combinations (each having exactly `k` elements) OR it includes duplicate combinations (I can't figure out which one is the the case). After i get the `min_variance_sub` I need to find sum of all the absolute differences between elements (& i'm able to do that by creating `itertools.combinations` of 2 elements each and getting the `abs` difference). Your approach worked well for `len(arr) < 20` but it gives wrong answer for `len(arr) >= 20` – P S Solanki Sep 03 '20 at 04:26
  • If you can move this convo to a chat room (as suggested by SO), I would give you the input samples and expected output (or if you want - the problem statement as well). I've been stuck on that practice problem for over 5 days :( – P S Solanki Sep 03 '20 at 04:28
  • 2
    Possible typo: should `myvar`'s return statement instead be `return sum((i-m)**2 for i in arr)/l`? – Kevin Sep 03 '20 at 16:19
  • 1
    @PSSolanki How to move to a chat room? I'm new here so I'm sure how to do that. If you can post your input examples, I would like to help. – fusion Sep 03 '20 at 21:18
  • @fusion I will do that. Just to let ya know chat.stackoverflow.com allows people to interact. I actually consulted your approach with kevin yesterday in my favorite [Python Chat](https://chat.stackoverflow.com/rooms/6/python) and we reached a conclusion that we may need to go with something else (a different approach.) Anyways your'approach made me clear 5 more test cases. (you will know what i'm talking about when you will see the problem statement and expected outputs for input test cases). – P S Solanki Sep 04 '20 at 04:59
  • @fusion Join the chat room here - [Join Room](https://chat.stackoverflow.com/rooms/220955/a-discussion-on-variance-generators) You and kevin have explicit write access. – P S Solanki Sep 04 '20 at 06:04
  • @kevin Consider joining the room, please :) – P S Solanki Sep 04 '20 at 06:06