I need to create a large numpy array containing random boolean values without hitting the swap.
My laptop has 8 GB of RAM. Creating a (1200, 2e6)
array takes less than 2 s and use 2.29 GB of RAM:
>>> dd = np.ones((1200, int(2e6)), dtype=bool)
>>> dd.nbytes/1024./1024
2288.818359375
>>> dd.shape
(1200, 2000000)
For a relatively small (1200, 400e3)
, np.random.randint
is still quite fast, taking roughly 5 s to generate a 458 MB array:
db = np.array(np.random.randint(2, size=(int(400e3), 1200)), dtype=bool)
print db.nbytes/1024./1024., 'Mb'
But if I double the size of the array to (1200, 800e3)
I hit the swap, and it takes ~2.7 min to create db
;(
cmd = """
import numpy as np
db = np.array(np.random.randint(2, size=(int(800e3), 1200)), dtype=bool)
print db.nbytes/1024./1024., 'Mb'"""
print timeit.Timer(cmd).timeit(1)
Using random.getrandbits
takes even longer (~8min), and also uses the swap:
from random import getrandbits
db = np.array([not getrandbits(1) for x in xrange(int(1200*800e3))], dtype=bool)
Using np.random.randint
for a (1200, 2e6)
just gives a MemoryError
.
Is there a more efficient way to create a (1200, 2e6)
random boolean array?