1

I am trying to create an ML application in which a front end takes user information and data, cleans it, and passes it to h2o AutoML for modeling, then recovers and visualizes the results. Since the back end will be a stand-alone / always-on service that gets called many times, I want to ensure that all objects created in each session are removed, so that h2o doesn't get cluttered and run out of resources. The problem is that many objects are being created, and I am unsure how to identify/track them, so that I can remove them before disconnecting each session.

Note that I would like the ability to run more than one analysis concurrently, which means I cannot just call remove_all(), since this may remove objects still needed by another session. Instead, it seems I need a list of session objects, which I can pass to the remove() method. Does anyone know how to generate this list?

Here's a simple example:

import h2o
import pandas as pd

df = pd.read_csv("C:\iris.csv")
my_frame = h2o.H2OFrame(df, "my_frame")

aml = H2OAutoML(max_runtime_secs=100)
aml.train(y='class', training_frame=my_frame)

Looking in the Flow UI shows that this simple example generated 5 new frames, and 74 models. Is there a session ID tag or something similar that I can use to identify these separately from any objects created in another session, so I can remove them?

Frames Created

Models Created

Helenus the Seer
  • 717
  • 1
  • 7
  • 10
  • Did you try `h2o.remove(aml)`? This should delete the automl instance on backend and cascade to all the submodels. It won't delete the training frame though. – Seb Jun 22 '20 at 13:31
  • @Seb, I thought I had tried this already, but maybe there was old data still there. When I tried it again, it worked! Please post as an answer so I can approve. Much appreciated... – Helenus the Seer Jun 23 '20 at 00:28

2 Answers2

1

You can use h2o.ls() to list the H2O objects. Then you can use h2o.remove('YOUR_key') to remove ones you don't want to keep.

For example:

#Create frame of objects
h_objects = h2o.ls()
#Filter for keys of one AutoML session
filtered_objects = h_objects[h_objects['key'].str.contains('AutoML_YYYYMMDD_xxxxxx')]
for key in filtered_objects['key']:
    h2o.remove(key)

Alternatively, you can remove all AutoML objects using the filter below instead.

filtered_objects = h_objects[h_objects['key'].str.lower().str.contains('automl')]
  • Thanks, Neema. As mentioned, I don't want to remove all AutoML objects, since there may be objects related to a different session which are still needed. If I can determine - or even better, if I can specify in advance - the 'key' you mentioned ('AutoML_YYYYMMDD_xxxxx'), then I agree I can use this method. Right now I can only think to try parsing the names of the models in my leaderboard to get this, and then following your method. Do you know a better way? I wish h2o had a remove_session_objects() method or something, it would be so much easier. – Helenus the Seer Jun 22 '20 at 05:31
  • 1
    For those searching, the answer to this question is the aml.project_name attribute. You will still have to remove the frames separately (which is easy to do since you created these explicitly anyway), but you can target the rest of the objects with the following command: h2o.remove([k for k in h2o.ls()['key'] if aml.project_name in k]) – Helenus the Seer Jun 23 '20 at 00:31
  • Nice find! Yes, `aml.project_name` will give you the key – Neema Mashayekhi Jun 23 '20 at 02:25
1

The recommended way to clean only your work is to use h2o.remove(aml). This will delete the automl instance on the backend and cascade to all the submodels and attached objects like metrics. It won't delete the frames that you provided though (e.g. training_frame).

Seb
  • 141
  • 3