As of October 6, 2021 there is not yet an implementation of this in Dask. There is an open Feature Request here.
Workaround for Specific Cases
From that same issue, this code below works for specific use cases in which the data for each grouped column fits on exactly 1 partition:
ddf = dask.datasets.timeseries()
ddf = ddf.set_index('id')
median_fun = dd.Aggregation(
name="median",
# this computes the median on each partition
chunk=lambda s: s.median(),
# this combines results across partitions; the input should just be a list of length 1
agg=lambda s0: s0.sum(),
)
median_ddf = ddf.groupby("id")["x"].agg(median_fun)
General Solution
For larger datasets, you could construct a custom aggregation function that calculates the median (or the 50th percentile) using ``dd.groupby.Aggregation`. If you do this, consider submitting it as a PR to resolve the feature request listed above.
See docs here: https://docs.dask.org/en/stable/generated/dask.dataframe.groupby.Aggregation.html#dask-dataframe-groupby-aggregation
Median vs 50th Percentile
Note that for most practical purposes, the 50th percentile and the median are equivalent when working with large datasets: https://math.stackexchange.com/questions/2048470/is-50th-percentile-equal-to-median