I am reading an H2OFrame
from a CSV file:
val h2oFrame = new H2OFrame(new File(inputCsvFilePath))
How can I perform an equivalent of a .filter()
operation (as available for Spark DataFrame
or RDD
). For example, how do I get a new H2OFrame
where "label" (which is a column name) is >1
?
I have tried converting to a org.apache.spark.sql.DataFrame
as below (simplified example):
val df = asDataFrame(h2oFrame)
val dff = df.filter(s"label > 1")
print(dff.toString(0,15))
But this seems to throw OutOfMemoryError
like below:
Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "Executor task launch worker-2"