I am aware of this question. But check the code(minimal-working example) below:
import dask.dataframe as dd
import pandas as pd
# intialise data of lists.
data = {'Name': ['Tom', 'nick', 'krish', 'jack'], 'Age': [20, 21, 19, 18]}
# Create DataFrame
df = pd.DataFrame(data)
dask_df = dd.from_pandas(df, npartitions=1)
categoric_df = dask_df.select_dtypes(include="category")
When I try to print the categoric_df
I get the following error:
ValueError: No objects to concatenate
And when I check the categoric_df
from PyCharm debugger:
Unable to get repr for <class 'dask.dataframe.core.DataFrame'>
With these errors, I can build a try/except block to check if the dataframe is empty or not. But I don't want to use this approach since it is not guaranteed to work all the time and try/except slows down the code.
And when I try to print computed categoric_df
it looks like this:
>>>print(categoric_df.compute())
Empty DataFrame
Columns: []
Index: [0, 1, 2, 3]
In summary: Here if I select the non-existing dypes and create a dask.DataFrame from it, I get a dask.DataFrame
which at first glance doesn't seem empty if I use len()
function.
>>>print(len(categoric_df))
4
>>>print(len(categoric_df.compute())
4
>>>print(categoric_df.compute().empty)
True
Is there a way to check if the categoric_df
is empty or not without computing it? (I want it to stay lazily evaluated.)
UPDATE:
print(len(categoric_df.columns))
is returning0
. This can be used for figuring out if the dataframe is empty or not. But is this viable? I am not sure.