Single process
If you're on a single machine and not using dask.distributed, then this doesn't matter. The variable x
is present and doesn't need to be moved around
Distributed or multi-process
If we have to move the function between processes then we'll need to serialize that function into a bytestring. Dask uses the library cloudpickle to do this.
The cloudpickle library converts the Python function f
into a bytes
object in a way that captures the external variables in most settings. So one way to see if your function will work with Dask is to try to serialize it and then deserialize it on some other machine.
import cloudpickle
b = cloudpickle.dumps(f)
cloudpickle.loads(b) # you may want to try this on your other machine as well
How cloudpickle achieves this can be quite complex. You may want to look at their documentation.