4

I am trying to convert a pandas dataframe that is MultiIndexed on two variables (an ID and a DateTime variable) to dask dataframe however I get the following error;

"NotImplementedError: Dask does not support MultiIndex Dataframes" 

I am using the following code

import pandas as pd
import dask.dataframe as dd

dask_df = dd.from_pandas(pandas_df)

Actually, I have over 700 pandas dataframes (each over 100 MB) I am planning to convert each pandas dataframe into dask and then append them all to one big dask dataframe to analyze the whole data. I think the MultiIndex thing is the only issue here. Please let me know if I am going the wrong way about this.

Sher Afghan
  • 101
  • 1
  • 11
  • Be aware that while dask seems to support multi-level columns, they have some behaviour differences and might be best to avoid too. – creanion Oct 27 '21 at 09:44

1 Answers1

4

Currently Dask DataFrame does not support dataframes with MultiIndexes.

You might consider converting all but one of your index columns into normal columns with reset_index.

MRocklin
  • 55,641
  • 23
  • 163
  • 235