I am creating, multiplying and then summing all elements of two big matrices in numpy. I do this some hundred times with two methods, a loop and with the help of the multiprocessing
modul (see the snipet below).
def worker_loop(n):
for i in n:
mul = np.sum(np.random.normal(size=[i,i])*np.random.normal(size=[i,i]))
def worker(i):
mul = np.sum(np.random.normal(size=[i,i])*np.random.normal(size=[i,i]))
n = range(100,300)
pool = ThreadPool(2)
pool.map(worker, n)
pool.close()
pool.join()
worker_loop(n)
Measuring the time tells that the loop is faster than multiprocessing
. I have also tried the threading
module with no success (then I read that this was a bad idea; read more here)
I started this experimenting with multithreading because I need to convert images, labels, bounding boxes, ... into tfrecords. For that I am studying a file from tensorflow/inception (if you want do dwell build_imagenet_data.py, line 453). I believe that here multithreading works that's why they use it.
Saying this, my question can be put as follows,
- what am I missing in my code; is it possible to achieve something with small modifications?
- does the example from inception work because tensorflow is written in c++ and CUDA?
- when is it advisable to use multiprocessing or multithreading with numpy, tensorflow and the like?