I have a Dask dataframe created with dd.read_csv("./*/file.csv")
where the *
glob is a folder for each date. In the concatenated dataframe I want to filter out subsets of time like how I would with a pd.between_time("09:30", "16:00")
, say.
Because Dask's internal representation of the index does not have the nice features of Pandas's DateTimeIndex, I haven' had any success with filtering how I normally would in Pandas. Short of resorting to a naive mapping function/loop, I am unable to get this to work in Dask.
Since the partitions are by date, perhaps that could be exploited by converting to a Pandas dataframe and then back to a Dask partition, but it seems like there should be a better way.
Updating with the example used in Angus' answer.
I guess I don't understand the logic of the queries in the answers/comments. Is Pandas smart enough to not interpret the boolean mask literally as a string and do the correct datetime comparisons?