I'm new do dask (imported as dd) and try to convert some pandas (imported as pd) code.
The goal of the following lines, is to slice the data to those columns, which's values fullfill the calculated requirement in dask.
There is a given table in csv. The former code reads
inputdata=pd.read_csv("inputfile.csv");
pseudoa=inputdata.quantile([.035,.965])
pseudob=pseudoa.diff().loc[.965]
inputdata=inputdata.loc[:,inputdata.columns[pseudob.values>0]]
inputdata.describe()
and is working fine. My simple idea for conversion was so substitute the first line to
inputdata=dd.read_csv("inputfile.csv");
but that resulted in the strange error message IndexError: too many indices for array
.
Even by switching to ready computed data in inputdata
and pseudob
the error remains.
Maybe the question is specifically assigned to the idea of calculated boolean slicing for dask-columns.
I just found a (maybe suboptimal) way (not solution) to do that. Changing line 4 to the following
inputdata=inputdata.loc[:,inputdata.columns[(pseudob.values>0).compute()[0]]]
seems to work.