Is there a dask equivalent of pandas empty
function? I want to check if a dask dataframe is empty but df.empty
return AttributeError: 'DataFrame' object has no attribute 'empty'
Asked
Active
Viewed 4,836 times
10

user308827
- 21,227
- 87
- 254
- 417
-
1I don't think so, but you can query `len(df) == 0`? – cs95 May 07 '18 at 03:34
-
that does seem to work, thanks! – user308827 May 07 '18 at 04:34
-
1Adding the `empty` method would be an easy addition to the project if anyone wants to contribute a pull request. – MRocklin May 07 '18 at 11:31
1 Answers
8
Dask doesn't currently support this, but you can compute the length on the fly:
len(df) == 0
len(df.index) == 0 # Likely to be faster

cs95
- 379,657
- 97
- 704
- 746
-
-
@JosephBerry this is true with pandas, so I'm guessing you're right. Will test in a bit. – cs95 Apr 16 '19 at 17:29
-
What is the time complexity of this operation? O(1)? Distributed O(1)? Or O(n) or distributed O(n)? – CMCDragonkai Nov 06 '19 at 22:18
-
@CMCDragonkai I'm not familiar with dask's internals. I don't think the length is stored, so it has to be pre-computed at least the first time you call `len`. I would assume that is linear, although admittedly I don't understand the difference between O(n) and distributed O(n). – cs95 Nov 06 '19 at 23:19
-
Because Dask dataframes are distributed across partitions across dask workers. I thought be distributed O(n). But I reckon the index might be precomputed ahead of time and shared across all partitions. Maybe is actually O(1). Hopefully somebody from Dask can clarify. – CMCDragonkai Nov 06 '19 at 23:28
-
This doesn't work all the time. Check my question here: https://stackoverflow.com/questions/59511235/how-to-check-if-dask-dataframe-is-empty-and-lazily-evaluated – MehmedB Dec 28 '19 at 13:16
-
1This is like a very in-efficent solition for checking if just **one** element is inside the dataframe.. Could point to counting millions or billions of row if you only want to find one. – gies0r Jul 21 '20 at 00:04
-
I can just say, that `len(df.head().index)` and `len(df.sample(frac=0.01).index)` is equally fast to `len(df.index)`, sadly.. – gies0r Jul 22 '20 at 10:36