Questions tagged [dask.distributed]
5 questions
5
votes
1 answer
memory usage when indexing a large dask dataframe on a single multicore machine
I am trying to turn the Wikipedia CirrusSearch dump into Parquet backed dask dataframe indexed by title on a 450G 16-core GCP instance.
CirrusSearch dumps come as a single json line formatted file.
The English Wipedia dumps contain 5M recards and…

Daniel Mahler
- 7,653
- 5
- 51
- 90
3
votes
1 answer
File Not Found Error in Dask program run on cluster
I have 4 machines, M1, M2, M3, and M4. The scheduler, client, worker runs on M1. I've put a csv file in M1. Rest of the machines are workers.
When I run the program with read_csv file in dask. It gives me Error, file not found

Dhruv Kumar
- 399
- 2
- 13
2
votes
0 answers
Bokeh UI not working with DASK on another host
I've ran dask with bokeh on a cluster with 4 machines.
Now I've opened the dask ui page, on :8787,
We can see the graphs etc. are not there
Empty UI
But the normal text and simple graphic are there
Logs
I'm getting this error in the console.
Error

Dhruv Kumar
- 399
- 2
- 13
2
votes
1 answer
`dask-kubernetes` scheduler - worker on AWS
I've been trying to set up a dask.distributed cluster using kubernetes. Setting up the kube cluster itself is pretty straightforward, the problem I am currently struggling with is that I can't get the local scheduler to connect to the workers.…

Matti Lyra
- 12,828
- 8
- 49
- 67
0
votes
1 answer
AttributeError: 'DataFrame' object has no attribute '_example'
I am trying to join a few geodataframes using the Dask python package.
While implementing my data processing algorithm I faced up with the next exception:
AttributeError: 'DataFrame' object has no attribute '_example'
Here is my code:
import…

Tequila
- 726
- 7
- 23