0

I am working with the itertools package and am trying to create all possible combinations of 1's 2's and 3's in an array with 900 values and then turn this into a 30 by 30 matrix. The code I have to do this is below and works fine.

for data in itertools.product([1,2,3], repeat=900):
    datalist=list(data)
    landarray=np.asarray(datalist).reshape(30, 30)

What I would like to do, however, is have it so that each value (1, 2, and 3) occurs exactly 300 times within the 900 value array. Thanks for your help!

jfs
  • 399,953
  • 195
  • 994
  • 1,670

3 Answers3

2

You want to generate all permutations of the np.repeat([1,2,3], 300) multiset. There is an algorithm that allows to generate the next permutation in O(1). Here's a simple algorithm that uses C++ std::next_permutation() function and prints permutations in a lexicographical order:

#!/usr/bin/env python
"""Print all multiset permutations."""
import pyximport; pyximport.install() # $ pip install cython
from next_permutation import next_permutation

n = 3
multiset = bytearray('a'*n + 'b'*n + 'c'*n)
print(multiset)
while next_permutation(multiset):
    print(multiset)

Where next_permutation module is a C extension module for Python defined in Cython:

# cython: boundscheck=False
#file: next_permutation.pyx
cimport cpython.array # support array.array() on Python 2
from libcpp cimport bool

ctypedef unsigned char dtype_t
ctypedef dtype_t* Iter

cdef extern from "<algorithm>" namespace "std":
   bool cpp_next_permutation "std::next_permutation" (Iter first, Iter last)

def next_permutation(dtype_t[:] a not None):
    return cpp_next_permutation(&a[0], &a[0] + a.shape[0])

To build it, specify that the language is C++:

#file: next_permutation.pyxbld
from distutils.extension import Extension

def make_ext(modname, pyxfilename):
    return Extension(name=modname,
                     sources=[pyxfilename],
                     language="c++")

Output

aaabbbccc
aaabbcbcc
aaabbccbc
aaabbcccb
aaabcbbcc
aaabcbcbc
aaabcbccb
aaabccbbc
aaabccbcb
aaabcccbb
aaacbbbcc
aaacbbcbc
aaacbbccb
aaacbcbbc
aaacbcbcb
aaacbccbb
..snip..
cccaabbba
cccabaabb
cccababab
cccababba
cccabbaab
cccabbaba
cccabbbaa
cccbaaabb
cccbaabab
cccbaabba
cccbabaab
cccbababa
cccbabbaa
cccbbaaab
cccbbaaba
cccbbabaa
cccbbbaaa

next_permutation() function accept anything that supports buffer interface e.g., it support numpy arrays:

import numpy as np
multiset = np.repeat(np.array([1,2,3], dtype=np.uint8), 3)
Community
  • 1
  • 1
jfs
  • 399,953
  • 195
  • 994
  • 1,670
1

Just shuffle an array that already has an even distribution.

landarray = np.repeat([1,2,3], 300)
np.random.shuffle(landarray)
landarray = landarray.reshape((30,30))

I guarantee you aren't going to be getting repeats of landarray. Which is to say: you need to make about [edit] 10^213 landarrays before there's a 50/50 chance of you having repeated it once.

U2EF1
  • 12,907
  • 3
  • 35
  • 37
  • How did you get `10**20`? – jfs Apr 23 '14 at 04:37
  • There are `(90 choose 30) * (60 choose 30) ~ 8e40` ways of shuffling `[1,2,3]*300`. Birthday approximation gives ~50% chance of a collision after `sqrt((90 choose 30) * (60 choose 30)) ~= 2.8e20` rounds. – U2EF1 Apr 23 '14 at 04:46
  • It looks too low e.g. there are 900!/300!/300!/300! possible permutations ~10**426 (assume that PRNG period is much larger) – jfs Apr 23 '14 at 04:55
  • Period of the prng is 10^6000, and yes I'm off by a large factor, (90 choose 30) should be (900 choose 300). I'll update. – U2EF1 Apr 23 '14 at 05:06
0

(giggle) You do realize that your code generates about 10**430 matrices, right?

Even the restricted version produces about 10**426 matrices.

You could be at this a very long time.


Edit for a sense of scale:

if every atom in the universe (about 10**80)

could do a billion billion operations per second (10**18)

and if you could process a billion matrices per operation (10**9)

and if you did this with a billion universes (10**9)

for a billion times the current age of our universe (about 10**26 seconds)

you would have reached a thousandth of a billionth of a billionth of a billionth of a billionth of a billionth of a billionth of a billionth of a billionth of a billionth of a billionth of a billionth of a billionth of a billionth of a billionth of a billionth of a billionth of a billionth of a billionth of a billionth of a billionth of a billionth of a billionth of a billionth of a billionth of a billionth of a billionth of a billionth of a billionth of a billionth of a billionth of a billionth of a percent of being finished.

(I'm starting to feel like Carl Sagan ;-)

Hugh Bothwell
  • 55,315
  • 8
  • 84
  • 99