3

I have a periodic batch job running on my laptop. The code looks like this:

client = Client()
print(client.scheduler_info())
topic='raw_data'
start = datetime.datetime.now()
delta = datetime.timedelta(minutes=2)
while True:
    end = start + delta
    if end <= datetime.datetime.now():
        start = end
        print('It\'s time to run the analysis for the 2 mins')
        data = get_data_from_parquet('raw_data_fast_par.par', start=start, end=end)
        metrics = [Metric1(), Metric2(), Metric3()]
        print(data.npartitions)
        channels = data.groupby(['col1', 'col2', 'col3'])
        for metric in metrics:
            features = metric.map_job(channels, start, end)
            print(features.count().compute())

In small words, every two minutes I perform some kind of analysis on the data, which I read them from a parquet file, predicating down date filtering. It is a test, so I know it doesn't make much sense now. I get the following warning on the Terminal. Could someone explain why is this happening, if it is important, and how I can avoid it?

distributed.comm.tcp - WARNING - Closing dangling stream in <TCP local=tcp://127.0.0.1:55448 remote=tcp://127.0.0.1:42197>
Apostolos
  • 7,763
  • 17
  • 80
  • 150

1 Answers1

4

I don't know what the actual issue is, but you might try cleanly closing down your local cluster when you're done, perhaps by using Client as a context manager.

with Client() as client:
    ...
MRocklin
  • 55,641
  • 23
  • 163
  • 235
  • Will try and get back. I did try restart at the end of the for loop but didn't have any affect, and I still got that message – Apostolos Nov 28 '18 at 14:02
  • This seems to have gotten rid of the warning but now I loose the "historical" data of the execution. Any work arrounds on that? – Apostolos Dec 04 '18 at 17:17
  • 2
    I managed to solve the issue by first instantiating a local cluster and then assigning it to a client with `cluster = LocalCluster() client=Client(cluster)` though I don't know how to explain it. – Apostolos Dec 04 '18 at 17:25
  • @Apostolos the usage of `LocalCluster` is documented [here](https://docs.dask.org/en/latest/setup/single-distributed.html#localcluster). I used `with LocalCluster(**kwargs) as cluster:` and a nested `with Client(cluster) as client:` to mitigate the *dangling stream* warning. I had previously been using `cluster.close()` and `client.close()`; the `with` statements appear to work more reliably. – jeschwar Apr 25 '19 at 18:53