How to check if dask dataframe is empty

Question

Is there a dask equivalent of pandas empty function? I want to check if a dask dataframe is empty but df.empty return AttributeError: 'DataFrame' object has no attribute 'empty'

Adding the `empty` method would be an easy addition to the project if anyone wants to contribute a pull request. — MRocklin, May 07 '18 at 11:31

cs95 · Accepted Answer · 2019-04-16T17:28:29.327

8

Dask doesn't currently support this, but you can compute the length on the fly:

len(df) == 0

len(df.index) == 0 # Likely to be faster

edited Apr 16 '19 at 17:28

answered May 07 '18 at 04:43

cs95

379,657
97
704
746

Probably `len(df.index)==0` should be faster – skibee Apr 16 '19 at 16:32
@JosephBerry this is true with pandas, so I'm guessing you're right. Will test in a bit. – cs95 Apr 16 '19 at 17:29
What is the time complexity of this operation? O(1)? Distributed O(1)? Or O(n) or distributed O(n)? – CMCDragonkai Nov 06 '19 at 22:18
@CMCDragonkai I'm not familiar with dask's internals. I don't think the length is stored, so it has to be pre-computed at least the first time you call `len`. I would assume that is linear, although admittedly I don't understand the difference between O(n) and distributed O(n). – cs95 Nov 06 '19 at 23:19
Because Dask dataframes are distributed across partitions across dask workers. I thought be distributed O(n). But I reckon the index might be precomputed ahead of time and shared across all partitions. Maybe is actually O(1). Hopefully somebody from Dask can clarify. – CMCDragonkai Nov 06 '19 at 23:28
This doesn't work all the time. Check my question here: https://stackoverflow.com/questions/59511235/how-to-check-if-dask-dataframe-is-empty-and-lazily-evaluated – MehmedB Dec 28 '19 at 13:16
1

This is like a very in-efficent solition for checking if just **one** element is inside the dataframe.. Could point to counting millions or billions of row if you only want to find one. – gies0r Jul 21 '20 at 00:04
I can just say, that `len(df.head().index)` and `len(df.sample(frac=0.01).index)` is equally fast to `len(df.index)`, sadly.. – gies0r Jul 22 '20 at 10:36

How to check if dask dataframe is empty

1 Answers1

Linked