Questions tagged [dask-delayed]

Dask.Delayed refers to the python interface that consists of the delayed function, which wraps a function or object to create Delayed proxies. Use this tag for questions related to the python interface.

Dask.Delayed refers to the python interface that consists of the delayed function, which wraps a function or object to create Delayed proxies. Use this tag for questions related to the python interface.

290 questions
1
vote
1 answer

How to resubmit dask jobs (from previous dead workers) when new workers are added after old workers have died?

Is there a way in dask to resubmit jobs (from previous dead workers) when new workers are added after old workers have died? Might it be possible to achieve this using a scheduler plugin distributed.diagnostics.plugin.SchedulerPlugin?
Tianyang Li
  • 1,755
  • 5
  • 26
  • 42
1
vote
1 answer

Changing dask delayed inputs without recreating graph

I have a series of computations on some data which I’m modelling as a graph with dask delayed, and works well, however the graph itself takes longer (or a comparable time) to create than the calculations take to run. I add data throughout the day,…
Mithra
  • 47
  • 1
  • 5
1
vote
1 answer

Dask - flexible memory allocation for LocalCluster

I've run into some memory problems while using dask's LocalCluster. I'm working on a machine with 32 CPUs, but I have only 64GB RAM available. I'm instantiating the cluster like that: cluster = LocalCluster( n_workers=os.cpu_count(), …
Piotr Rarus
  • 884
  • 8
  • 16
1
vote
1 answer

Strategy to distribute large number of jobs with dask on HPC cluster

I have a rather complex python algorithm I need to distribute across a HPC cluster. The code is run from a Jupyterhub instance with 60 gb memory. The configuration of the PBS cluster is 1 process, 1 core, 30Gb per worker, nanny=False (the…
Mike
  • 893
  • 7
  • 22
1
vote
2 answers

Where does dask store files while running on juputerlab

I'm running dask on jupyterlab. I'm trying to save some file in home directory where my python file is stored and it's running properly but I'm not able to find out where my files are getting saved. So I made a folder named output in home directory…
Chris_007
  • 829
  • 11
  • 29
1
vote
1 answer

Parallel computation with Dask and Xarray

I have the following function @dask.delayed def load_ds(p): import xarray as xr multi_file_dataset = xr.open_mfdataset(p, combine='by_coords', concat_dim="time", parallel=True) mean = multi_file_dataset['tas'].mean(dim='time') return…
Fab
  • 1,145
  • 7
  • 20
  • 40
1
vote
1 answer

Dask delayed sum gets killed but there are enough resources

I'm creating a function that reads and entire folder, creates a Dask dataframe, then processes the partitions of this dataframe and sums the results, like this: import dask.dataframe as dd from dask import delayed, compute def…
6659081
  • 381
  • 7
  • 21
1
vote
2 answers

Dask : how to parallelize and serialize methods?

I am trying to parallize methods from a class using Dask on a PBS cluster. My greatest challenge is that this method should parallelize some computations, then run further parallel computations on the result. Of course, this should be distributed on…
Mike
  • 893
  • 7
  • 22
1
vote
1 answer

Why merging with DASK Delayed takes extremely more time than merging with DASK built-in command?

I want to merge a large pandas dataframe with shape of df1.shape = (80000, 18) to a small one with shape of df2.shape = (1, 18) on a column called "key". Here is the time performance using dd.merge: ddf1 = from_pandas(df1, npartitions=20) ddf2 =…
1
vote
1 answer

How to concat on axis=1 with Dask delayed? (simplified)

Pandas and Dask produce different results (because I'm doing something wrong in Dask I think). I want to get the Dask result to match the Pandas one here. This toy program should run as-is to demonstrate: import dask import dask.dataframe as…
user5406764
  • 1,627
  • 2
  • 16
  • 23
1
vote
0 answers

Dask : 'DataFrame' object has no attribute '_meta'

I tried to connect Ms SQL server and loading dataframe into the SQL server, while connecting I keep on getting "no attribute '_meta'". I am new to Dask Dataframe, can someone help me out. It would be very…
Dinesh Gedda
  • 39
  • 1
  • 9
1
vote
0 answers

Sorting dataset along axis with dask

I want to sort dataset (netcdf file) along time dimension for each year and then average them. Problem is that dask only supports 'topk' sorting, which consumes all the memory if include whole range of values. Xarray only supports sorting of 1D…
wol
  • 142
  • 1
  • 14
1
vote
1 answer

Is it better to `compute` for control flow or build a fully-`delayed` task graph?

I have an existing Pandas codebase and have just started trying to convert it to Dask. I am still trying to wrap my head around Dask dataframe, delayed, and distributed. From reading over the dask.delayed docs, it seems like the ideal case would be…
Louis
  • 23
  • 5
1
vote
1 answer

Running Embarrasingly Parallel operations on a single piece of data using Dask

I was following this tutorial and I was able to parallelize a for loop where operations were done independently on multiple files. However, now I need to perform an iterative function to extract variables into 300 files from a single xarray dataset…
Bhanu Magotra
  • 49
  • 1
  • 8
1
vote
1 answer

Save larger than memory Dask array to hdf5 file

I need to save dask arrays to hdf5 when using dask distributed. My situation is very similar to the one described in this issue:https://github.com/dask/dask/issues/3351. Basically this code will work: import dask.array as da from distributed import…
Eric Eckert
  • 117
  • 6