Questions tagged [dask.distributed]

5 questions
5
votes
1 answer

memory usage when indexing a large dask dataframe on a single multicore machine

I am trying to turn the Wikipedia CirrusSearch dump into Parquet backed dask dataframe indexed by title on a 450G 16-core GCP instance. CirrusSearch dumps come as a single json line formatted file. The English Wipedia dumps contain 5M recards and…
Daniel Mahler
  • 7,653
  • 5
  • 51
  • 90
3
votes
1 answer

File Not Found Error in Dask program run on cluster

I have 4 machines, M1, M2, M3, and M4. The scheduler, client, worker runs on M1. I've put a csv file in M1. Rest of the machines are workers. When I run the program with read_csv file in dask. It gives me Error, file not found
Dhruv Kumar
  • 399
  • 2
  • 13
2
votes
0 answers

Bokeh UI not working with DASK on another host

I've ran dask with bokeh on a cluster with 4 machines. Now I've opened the dask ui page, on :8787, We can see the graphs etc. are not there Empty UI But the normal text and simple graphic are there Logs I'm getting this error in the console. Error
2
votes
1 answer

`dask-kubernetes` scheduler - worker on AWS

I've been trying to set up a dask.distributed cluster using kubernetes. Setting up the kube cluster itself is pretty straightforward, the problem I am currently struggling with is that I can't get the local scheduler to connect to the workers.…
0
votes
1 answer

AttributeError: 'DataFrame' object has no attribute '_example'

I am trying to join a few geodataframes using the Dask python package. While implementing my data processing algorithm I faced up with the next exception: AttributeError: 'DataFrame' object has no attribute '_example' Here is my code: import…
Tequila
  • 726
  • 7
  • 23