0

How can I get the same result I`m getting on pandas on DASK?

The objective is to have a uniform time interval for each group, replicating the last value until we have a new one.

import pandas as pd import numpy as np import datetime

data=pd.DataFrame([["AAAA","2020-01-15",2],
                    ["AAAA","2020-02-15",9],
                    ["AAAA","2020-02-20",2],
                    ["AAAA","2020-02-25",9],
                    ["AAAA","2020-04-18",2],
                    ["BBBB","2020-01-01",5],
                    ["BBBB","2020-02-15",5],
                    ["BBBB","2020-02-20",4],
                    ["BBBB","2020-02-25",4],
                    ["BBBB","2020-04-15",2],
                    ["CCCC","2020-01-01",9],
                    ["CCCC","2020-02-15",5],
                    ["CCCC","2020-03-20",7],
                    ["CCCC","2020-04-25",4],
                    ["CCCC","2020-05-15",2]])
                  
data.columns=['Asset','Date','P']
data['Date']=pd.to_datetime(data['Date'])
data.index=data['Date'].values

temp=data.groupby('Asset').resample('2D').pad()
temp

** this is just an example, the real-world application is really big.

Thanks!

1 Answers1

0

.resample() functionality is not fully replicated in the current version of dask. My suggestion would be to either look into xarray (if you want to have grid-like structure) or use dask.delayed wrapped around pandas.

SultanOrazbayev
  • 14,900
  • 3
  • 16
  • 46