Up till now, I have parallelized functions by mapping them on to lists that are distributed out to the various clusters using the function map_sync(function, list)
.
Now, I need to run a function on each entry of a dictionary.
map_sync does not seem work on dictionaries. I have also tried to scatter the dictionary and use decorators to run the function in parallel. However, dictionaries dont seem to lend themselves to scattering either. Is there some other way to parallelize functions on dictionaries without having to convert to lists?
These are my attempts thus far:
from IPython.parallel import Client
rc = Client()
dview = rc[:]
test_dict = {'43':"lion", '34':"tiger", '343':"duck"}
dview.scatter("test",test)
dview["test"]
# this yields [['343'], ['43'], ['34'], []] on 4 clusters
# which suggests that a dictionary can't be scattered?
Needless to say, when I run the function itself, I get an error:
@dview.parallel(block=True)
def run():
for d,v in test.iteritems():
print d,v
run()
AttributeError
Traceback (most recent call last) in ()
in run(dict)
AttributeError: 'str' object has no attribute 'iteritems'
I don't know if it's relevant, but I'm using an IPython Notebook connected to Amazon AWS clusters.