I have a computation which runs much slower within a Dask/Distributed worker compared to running it locally. I can reproduce it without any I/O going on, so I can rule out that it has to do with transferring data. The following code is a minimal reproducing example:
import time
import pandas as pd
import numpy as np
from dask.distributed import Client, LocalCluster
def gen_data(N=5000000):
""" Dummy data generator """
df = pd.DataFrame(index=range(N))
for c in range(10):
df[str(c)] = np.random.uniform(size=N)
df["id"] = np.random.choice(range(100), size=len(df))
return df
def do_something_on_df(df):
""" Dummy computation that contains inplace mutations """
for c in range(df.shape[1]):
df[str(c)] = np.random.uniform(size=df.shape[0])
return 42
def run_test():
""" Test computation """
df = gen_data()
for key, group_df in df.groupby("id"):
do_something_on_df(group_df)
class TimedContext(object):
def __enter__(self):
self.t1 = time.time()
def __exit__(self, exc_type, exc_val, exc_tb):
self.t2 = time.time()
print(self.t2 - self.t1)
if __name__ == "__main__":
client = Client("tcp://10.0.2.15:8786")
with TimedContext():
run_test()
with TimedContext():
client.submit(run_test).result()
Running the test computation locally takes ~10 seconds, but it takes ~30 seconds from within Dask/Distributed. I also noticed that the Dask/Distributed workers output a lot of log messages like
distributed.core - WARNING - Event loop was unresponsive for 1.03s. This is often caused by long-running GIL-holding functions or moving large chunks of data. This can cause timeouts and instability.
distributed.core - WARNING - Event loop was unresponsive for 1.25s. This is often caused by long-running GIL-holding functions or moving large chunks of data. This can cause timeouts and instability.
distributed.core - WARNING - Event loop was unresponsive for 1.91s. This is often caused by long-running GIL-holding functions or moving large chunks of data. This can cause timeouts and instability.
distributed.core - WARNING - Event loop was unresponsive for 1.99s. This is often caused by long-running GIL-holding functions or moving large chunks of data. This can cause timeouts and instability.
distributed.core - WARNING - Event loop was unresponsive for 1.50s. This is often caused by long-running GIL-holding functions or moving large chunks of data. This can cause timeouts and instability.
distributed.core - WARNING - Event loop was unresponsive for 1.90s. This is often caused by long-running GIL-holding functions or moving large chunks of data. This can cause timeouts and instability.
distributed.core - WARNING - Event loop was unresponsive for 2.23s. This is often caused by long-running GIL-holding functions or moving large chunks of data. This can cause timeouts and instability.
...
which are surprising, because it's not clear what is holding the GIL in this example.
Why is there such a big performance difference? And what can I do to get the same performance?
Disclaimer: Self answering for documentation purposes...