7

I've worked in the h2o R package for quite a while, now, but have recently had to move to the python package.

For the most part, an H2OFrame is designed to work like a pandas DataFrame object. However, there are several hurdles I haven't managed to get over... in Pandas, if I want to drop some rows:

df.drop([0,1,2], axis=0, inplace=True)

However, I cannot figure out how to do the same with an H2OFrame:

frame.drop([0,1,2], axis=0)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-30-0eff75c48e35> in <module>()
----> frame.drop([0,1,2], axis=0)

TypeError: drop() got an unexpected keyword argument 'axis'

Their github source documents that the drop method is only for columns, so obviously the obvious way isn't working:

def drop(self, i):
    """Drop a column from the current H2OFrame.

Is there a way to drop rows from an H2OFrame?

TayTay
  • 6,882
  • 4
  • 44
  • 65

2 Answers2

4

Currently, the H2OFrame.drop method does not support this, but we have added a ticket to add support for dropping multiple rows (and multiple columns).

In the meantime, you can subset rows by an index:

import h2o
h2o.init(nthreads = -1)

hf = h2o.H2OFrame([[1,3],[4,5],[3,0],[5,5]])  # 4 rows x 2 columns
hf2 = hf[[1,3],:]  # Keep some of the rows by passing an index

Note that the index list, [1,3], is ordered. If you try to pass [3,1] instead, you will get an error. H2O will not reorder the rows, and this is its way of telling you that. If you have a list of out-of-order indexes, just wrap the sorted function around it first.

hf2 = hf[sorted([3,3]),:]

Lastly, if you prefer, it's also okay to reassign the new subsetted frame to the original frame name, as follows:

hf = hf[[1,3],:]
Erin LeDell
  • 8,704
  • 1
  • 19
  • 35
3

Since this is now supported I wanted to highlight the comment that says how to drop by index:

df = df.drop([0,1,2], axis=0)

where if axis = 1 (default), then it drop columns; if axis=0 then drop rows.

drop(index, axis=1)

where index is a list of column indices, column names, or row indices to drop; or a string to drop a single column by name; or an int to drop a single column by index.

Lauren
  • 5,640
  • 1
  • 13
  • 19