how to create column from delayed function in Dask

Question

is it possible to create a column from delayed function in Dask?

e.g. if we create a column in pyspark by df.withColumn('datetime', F.lit(datetime.now()) the value of this column is not calculated until we request.

My question is - can we do similar thing in Dask? As far as i know dask is also lazy by default, but it seems like there is no way to achieve the same result as using pyspark?

Rather than asking us to translate spark, can you just describe more specifically what you’re trying to do in dask? Also, have you seen the function `dask.dataframe.from_delayed`? — Michael Delgado, Feb 28 '23 at 17:11
i have a dataframe and want to add a column that stores the timestamp of the compute time. if i just set the column to `datetime.now()`, the value can differ significantly from the desired one when the computation time is lengthy a workaround is to set the column after compute, just thinking is there another way to do the work — Hawii Hawii, Feb 28 '23 at 17:57
Huh. You want an entire column to have the same timestamp over and over? Seems like a good use case for a standalone variable… but yeah you could use `from_delayed` to do this. Alternatively if you want to have the time vary by partition you could map a function which assigns the column using `df.map_partitions` — Michael Delgado, Feb 28 '23 at 19:06
If you want implementation help please edit the question to remove the spark references and clarify your goals and then set up a sample problem using code as a [mre]. Thanks! — Michael Delgado, Feb 28 '23 at 19:08

how to create column from delayed function in Dask

0 Answers0