0

I just found this and I'm interested in exploring using the SQL store to compute some of my learner outputs. I think the database call might be much faster than base R or Python.

Awesome vignette: http://cran.nexr.com/web/packages/tidypredict/vignettes/randomForest.html

Basic code:

require(pacman)
p_load(randomForest, tidypredict, dbplyr)

model <- randomForest(Species ~ .,data = iris ,ntree = 10, proximity = TRUE)
tidypredict_sql(model, dbplyr::simulate_mssql())

It gives the SQL code for each of the trees, which I can then aggregate using the mode to get the estimator output.

So how do I use it with something like h2o.ai especially something like h2o.randomForest?

EngrStudent
  • 1,924
  • 31
  • 46

1 Answers1

1

In the vignette, the data originates from local R memory and the model is a local R model (this can be inferred from parse_model(model) and tidypredict_fit(model) half-way down the page). The SQL component is you refer to is only being used to display the model in a format that is easier to read.

From your question I understand that you want to read data from an SQL database and fit the model using SQL. However, from the vignette, the data is not stored in SQL and the model is not being fit using SQL.

From (brief) researching h2o.ai is seems like this is a software or a platform, rather than a database. dbplyr provides translations from standard dplyr commands into database languages (SQL). This allows you to write manipulations in R and have them translated and executed on the database without pulling the information into local R memory.

I have written R functions that could handle local or database tables as input. But this required some deliberate design choices. So while you might be able to pass a remote dbplyr table to an h2o.ai model, unless the underlying functions are written in R and in a way that works with dbplyr translation, I would expect the model to error.

It would be very surprising for commercial AI software not to have an inbuilt way of accessing databases. So it would probably be more effective to investigate how h2o.ai recommend ingesting data.

Simon.S.A.
  • 6,240
  • 7
  • 22
  • 41