1

Original version of my problem

I'm trying to do a brute-force search using scipy.optimize.brute.

The cost function can be evaluated if 4 parameters are given, but those 4 parameters must follow some conditions.

To deal with it and some other complecated issues, I made my python class, which is simplified as Parameter in below example, but some of the attributes got lost when I use multiprocessing via workers keyword.

Simplified version of my problem

import numpy as np
from multiprocessing import Pool

class Parameter(np.ndarray):
    def __new__(cls, maximum):
        self = np.asarray([0., 0., 0., 0.], dtype=np.float64).view(cls)
        return self

    def __init__(self, maximum):
        self.maximum = maximum
        self.validity = True

    def isvalid(self):
        if self.sum() <= self.maximum:
            return True
        else:
            return False

    def set(self, arg):
        for i in range(4):
            self[i] = arg[i]
        self.validity = self.isvalid()

def cost(arg, para):
    para.set(arg)
    if para.validity:
        return para.sum()
    else:
        return para.maximum

class CostWrapper:
    def __init__(self, f, args):
        self.f = f
        self.args = [] if args is None else args

    def __call__(self, x):
        return self.f(np.asarray(x), *self.args)

if __name__ == '__main__':
    parameter = Parameter(100)
    wrapped_cost = CostWrapper(cost, (parameter,))
    parameters_to_be_evaluated = [np.random.rand(4) for _ in range(4)]
    with Pool(2) as p:
        res = p.map(wrapped_cost, parameters_to_be_evaluated)

, which raises

  File "\_bug_attribute_lose.py", line 126, in isvalid
    if self.sum() <= self.maximum:
AttributeError: 'Parameter' object has no attribute 'maximum'

But, if I use wrapped_cost without p.map, like below does not raise error.

wrapped_cost(np.random.rand(4))

What I've tried

By putting some print messages all around my code, I found that both __new__ and __init__ methods are called only once, so I guess that multiprocessing library somehow copied parameter.

Also, I found out that the copied version of parameter only contains attributes that np.ndarray has:

dir(para) = ['T', '__abs__', '__add__', '__and__', '__array__', '__array_finalize__', '__array_function__', '__array_interface__', '__array_prepare__', '__array_priority__', '__array_struct__', '__array_ufunc__', '__array_wrap__', '__bool__', '__class__', '__complex__', '__contains__', '__copy__', '__deepcopy__', '__delattr__', '__delitem__', '__dict__', '__dir__', '__divmod__', '__doc__', '__eq__', '__float__', '__floordiv__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__gt__', '__hash__', '__iadd__', '__iand__', '__ifloordiv__', '__ilshift__', '__imatmul__', '__imod__', '__imul__', '__index__', '__init__', '__init_subclass__', '__int__', '__invert__', '__ior__', '__ipow__', '__irshift__', '__isub__', '__iter__', '__itruediv__', '__ixor__', '__le__', '__len__', '__lshift__', '__lt__', '__matmul__', '__mod__', '__module__', '__mul__', '__ne__', '__neg__', '__new__', '__or__', '__pos__', '__pow__', '__radd__', '__rand__', '__rdivmod__', '__reduce__', '__reduce_ex__', '__repr__', '__rfloordiv__', '__rlshift__', '__rmatmul__', '__rmod__', '__rmul__', '__ror__', '__rpow__', '__rrshift__', '__rshift__', '__rsub__', '__rtruediv__', '__rxor__', '__setattr__', '__setitem__', '__setstate__', '__sizeof__', '__str__', '__sub__', '__subclasshook__', '__truediv__', '__xor__', 'all', 'any', 'argmax', 'argmin', 'argpartition', 'argsort', 'astype', 'base', 'byteswap', 'choose', 'clip', 'compress', 'conj', 'conjugate', 'copy', 'ctypes', 'cumprod', 'cumsum', 'data', 'diagonal', 'dot', 'dtype', 'dump', 'dumps', 'fill', 'flags', 'flat', 'flatten', 'getfield', 'imag', 'isvalid', 'item', 'itemset', 'itemsize', 'max', 'mean', 'min', 'nbytes', 'ndim', 'newbyteorder', 'nonzero', 'partition', 'prod', 'ptp', 'put', 'ravel', 'real', 'repeat', 'reshape', 'resize', 'round', 'searchsorted', 'set', 'setfield', 'setflags', 'shape', 'size', 'sort', 'squeeze', 'std', 'strides', 'sum', 'swapaxes', 'take', 'tobytes', 'tofile', 'tolist', 'tostring', 'trace', 'transpose', 'var', 'view']

(see that neither 'maximum' nor 'validity' exist)

Therefore, I tried to implement __copy__ method in Parameter class, like

def __copy__(self):
    print('__copy__')
    new = Parameter(self.maximum)
    new.__dict__.update(self.__dict__)
    return new

, but failed.

My questions:

  1. Some of the attributes that Parameter object should have got lost. My guess is that it's because multiprocessing library somehow copied the variable parameter, but I didn't implement the copy method properly. Am I right?

  2. If so, how can I do that? If not, please let me know which makes the error.

UJung
  • 13
  • 3

1 Answers1

0

It's a bit tricky but it's possible.

First, when inheriting from np.ndarray you should define __array_finalize__ method that will retrieve your custom attributes from the object returned by __new__. Note that __array_finalize__ is for some reason called multiple times, so you have to introduce a null guard. More about this in the docs.

def __array_finalize__(self, obj):
    if obj is None: return
    self.maximum = getattr(obj, 'maximum', None)
    self.validity = getattr(obj, 'validity', None)

Secondly, multiprocessing.Pool serializes the data before sending them to workers using pickle. In the process, your extra attributes are lost. So we have to add them back before continuing.

Override __reduce__ method:

def __reduce__(self):
    pickled_state = super().__reduce__()
    new_state = pickled_state[2] + (self.__dict__, )
    return (*pickled_state[0:2], new_state)

And override __setstate__ method:

def __setstate__(self, state):
        self.__dict__.update(state[-1])
        super().__setstate__(state[0:-1])

The implementation was borrowed from this answer.

Ok, now let's combine it into a runnable code:

import numpy as np
from multiprocessing import Pool

class Parameter(np.ndarray):
    def __new__(cls, maximum):
        obj = np.asarray([0, 0, 0, 0], dtype=np.float64).view(cls)
        obj.maximum = maximum
        obj.validity = True
        return obj
    
    def __array_finalize__(self, obj):
        if obj is None: return
        self.maximum = getattr(obj, 'maximum', None)
        self.validity = getattr(obj, 'validity', None)

    def __reduce__(self):
        pickled_state = super().__reduce__()
        new_state = pickled_state[2] + (self.__dict__, )
        return (*pickled_state[0:2], new_state)
    
    def __setstate__(self, state):
        self.__dict__.update(state[-1])
        super().__setstate__(state[0:-1])

    def isvalid(self):
        return self.sum() <= self.maximum

    def set(self, arg):
        for i in range(4):
            self[i] = arg[i]
        self.validity = self.isvalid()

def cost(arg, para):
    para.set(arg)
    return para.sum() if para.validity else para.maximum

class CostWrapper:
    def __init__(self, f, args):
        self.f = f
        self.args = () if args is None else args

    def __call__(self, x):
        return self.f(np.asarray(x), *self.args)

if __name__ == '__main__':
    parameter = Parameter(100)
    wrapped_cost = CostWrapper(cost, (parameter,))
    parameters_to_be_evaluated = [np.random.rand(4) for _ in range(4)]
    with Pool(2) as p:
        res = p.map(wrapped_cost, parameters_to_be_evaluated)

By the way, did you know this question already exists? Here. But it doesn't share your problem with multiple attributes (which is an easy fix), so I will cut you some slack this time.

sanitizedUser
  • 1,723
  • 3
  • 18
  • 33
  • I didn't noticed that the same question was posted before. Both of your answer and the post you gave me are really helpful to me. Thank you very much! – UJung Aug 18 '20 at 12:27