Parallelize function on dictionary in IPython

Question

Up till now, I have parallelized functions by mapping them on to lists that are distributed out to the various clusters using the function map_sync(function, list) .

Now, I need to run a function on each entry of a dictionary.

map_sync does not seem work on dictionaries. I have also tried to scatter the dictionary and use decorators to run the function in parallel. However, dictionaries dont seem to lend themselves to scattering either. Is there some other way to parallelize functions on dictionaries without having to convert to lists?

These are my attempts thus far:

from IPython.parallel import Client
rc = Client()
dview = rc[:]

test_dict = {'43':"lion", '34':"tiger", '343':"duck"}
dview.scatter("test",test)

dview["test"]
# this yields [['343'], ['43'], ['34'], []] on 4 clusters
# which suggests that a dictionary can't be scattered?

Needless to say, when I run the function itself, I get an error:

@dview.parallel(block=True)
def run():
    for d,v in test.iteritems():
        print d,v

run()

AttributeError
Traceback (most recent call last) in () in run(dict) AttributeError: 'str' object has no attribute 'iteritems'

I don't know if it's relevant, but I'm using an IPython Notebook connected to Amazon AWS clusters.

score 3 · Accepted Answer · answered May 19 '13 at 03:28

You can scatter a dict with:

def scatter_dict(view, name, d):
    """partition a dictionary across the engines of a view"""
    ntargets = len(view)
    keys = d.keys() # list(d.keys()) in Python 3
    for i, target in enumerate(view.targets):
        subd = {}
        for key in keys[i::ntargets]:
            subd[key] = d[key]
        view.client[target][name] = subd

scatter_dict(dview, 'test', test_dict)

and then operate on it remotely, as you normally would.

You can also gather the remote dicts into one local one again with:

def gather_dict(view, name):
    """gather dictionaries from a DirectView"""
    merged = {}
    for d in view.pull(name):
        merged.update(d)
    return merged

gather_dict(dv, 'test')

An example notebook

Parallelize function on dictionary in IPython

1 Answers1