I am calling dask.delay on the following function, for multiple "self" (different objects of same class) in a loop.
This is the delayed function, defined inside a custom defined subclass of keras.engine.training.Model:
def fit(self, X: Union[pd.DataFrame, list, np.array], y=None):
if not isinstance(X, pd.DataFrame):
super().fit(X, y) # calls keras.models fit
return
df = X
X, y = self.to_supervised(df, to_train=True) # transforms df into a supervised problem
self.history = super().fit(X, y)
return self
This is how I delay and compute the functions in the loop:
def fit_estimator(estimator, train_df):
return estimator.fit(train_df) #calls fit method shown previously
lazy_results = []
for mlp in mlps:
lazy_result = dask.delayed(fit_estimator)(mlp,train_df.copy())
lazy_results.append(lazy_result)
lazy_results = dask.compute(*lazy_results)
This subclass has an object attribute called "network" of type tensorflow.python.trackable.data_structures._DictWrapper. This object is serializable since pickle.dumps and dask's serialize API also works to serialize it. Moreover, serialization/deserialization of an object of the whole subclass works with no problems whatsoever. However, when I call lazy_results = dask.compute(*lazy_results), the error "Exception: TypeError("can not serialize '_DictWrapper' object")" gets thrown. More specifically, returning "self" and thus serializing it is being the problem here.
- Dask version: 2022.12.1
- Python version: 3.9
If I do not return "self" in "fit" and instead return its serialization directly, it works fine, but I have to deserialize it back manually after the scheduler receives the serialized objects. Am I missing something or is this a bug?
If I do not return "self" in "fit" and instead return its serialization directly, it works fine, but I have to deserialize it back manually after the scheduler receives the serialized objects. Am I missing something or is this a bug?