How to check if dask dataframe is empty if lazily evaluated?

Question

I am aware of this question. But check the code(minimal-working example) below:

import dask.dataframe as dd
import pandas as pd

# intialise data of lists.
data = {'Name': ['Tom', 'nick', 'krish', 'jack'], 'Age': [20, 21, 19, 18]}

# Create DataFrame
df = pd.DataFrame(data)
dask_df = dd.from_pandas(df, npartitions=1)

categoric_df = dask_df.select_dtypes(include="category")

When I try to print the categoric_df I get the following error:

ValueError: No objects to concatenate

And when I check the categoric_df from PyCharm debugger:

Unable to get repr for <class 'dask.dataframe.core.DataFrame'>

With these errors, I can build a try/except block to check if the dataframe is empty or not. But I don't want to use this approach since it is not guaranteed to work all the time and try/except slows down the code. And when I try to print computed categoric_df it looks like this:

>>>print(categoric_df.compute())
Empty DataFrame
Columns: []
Index: [0, 1, 2, 3]

In summary: Here if I select the non-existing dypes and create a dask.DataFrame from it, I get a dask.DataFrame which at first glance doesn't seem empty if I use len() function.

>>>print(len(categoric_df))
4
>>>print(len(categoric_df.compute())
4
>>>print(categoric_df.compute().empty)
True

Is there a way to check if the categoric_df is empty or not without computing it? (I want it to stay lazily evaluated.)

UPDATE: print(len(categoric_df.columns)) is returning 0. This can be used for figuring out if the dataframe is empty or not. But is this viable? I am not sure.

I've just figured out that I can use `print(len(categoric_df.columns))` which prints out `0` if the dataframe is empty. But is this viable? I am not sure. — MehmedB, Dec 29 '19 at 15:12

score 1 · Accepted Answer · answered Jan 01 '20 at 03:20

1

It looks like you're run into a bug where a dataframe isn't printing correctly. If you felt like raising a bug report at https://github.com/dask/dask/issues/new that would be the right place to report this.

This shouldn't affect the check that you want to do though. Looking at .columns to see if there are any columns seems reasonable. The fact that the dataframe still has rows just means that there is still an index.

answered Jan 01 '20 at 03:20

MRocklin

55,641
23
163
235

I just opened an issue as you suggested. I am waiting for a response... – MehmedB Jan 02 '20 at 18:18
@MehmedB if dask_df contains headers but no rows, len(dask_df.columns) won't return 0 and is not able to check empty. – jerrytim Feb 11 '20 at 20:05
@MehmedB Could you provide a link to the issue for reference, please? – gies0r Jul 21 '20 at 00:05
1

@gies0r Here: https://github.com/dask/dask/issues/5761 – MehmedB Jul 21 '20 at 08:47

How to check if dask dataframe is empty if lazily evaluated?

1 Answers1

Linked