10

Is there a dask equivalent of pandas empty function? I want to check if a dask dataframe is empty but df.empty return AttributeError: 'DataFrame' object has no attribute 'empty'

user308827
  • 21,227
  • 87
  • 254
  • 417

1 Answers1

8

Dask doesn't currently support this, but you can compute the length on the fly:

len(df) == 0

len(df.index) == 0 # Likely to be faster 
cs95
  • 379,657
  • 97
  • 704
  • 746
  • Probably `len(df.index)==0` should be faster – skibee Apr 16 '19 at 16:32
  • @JosephBerry this is true with pandas, so I'm guessing you're right. Will test in a bit. – cs95 Apr 16 '19 at 17:29
  • What is the time complexity of this operation? O(1)? Distributed O(1)? Or O(n) or distributed O(n)? – CMCDragonkai Nov 06 '19 at 22:18
  • @CMCDragonkai I'm not familiar with dask's internals. I don't think the length is stored, so it has to be pre-computed at least the first time you call `len`. I would assume that is linear, although admittedly I don't understand the difference between O(n) and distributed O(n). – cs95 Nov 06 '19 at 23:19
  • Because Dask dataframes are distributed across partitions across dask workers. I thought be distributed O(n). But I reckon the index might be precomputed ahead of time and shared across all partitions. Maybe is actually O(1). Hopefully somebody from Dask can clarify. – CMCDragonkai Nov 06 '19 at 23:28
  • This doesn't work all the time. Check my question here: https://stackoverflow.com/questions/59511235/how-to-check-if-dask-dataframe-is-empty-and-lazily-evaluated – MehmedB Dec 28 '19 at 13:16
  • 1
    This is like a very in-efficent solition for checking if just **one** element is inside the dataframe.. Could point to counting millions or billions of row if you only want to find one. – gies0r Jul 21 '20 at 00:04
  • I can just say, that `len(df.head().index)` and `len(df.sample(frac=0.01).index)` is equally fast to `len(df.index)`, sadly.. – gies0r Jul 22 '20 at 10:36