0

I'm pretty new in SAP World and I’m trying to work with R Server installed within SAP HANA Studio (Version of HANA Studio : 2.3.8 & Version of R Server 3.4.0)

My tasks are:

  • Train the randomForest model on R Server within HANA Studio (with help of RLANG procedure on HANA)
  • Save the randomForest model as PAL model object in HANA
  • Make prediction on new data in HANA using this model

Here is a small example of RLANG procedure for training a saving the model on HANA:

    PROCEDURE "PA"."RF_TRAIN" ( 
    IN data "PA"."IRIS", 
    OUT modelOut "PA"."TRAIN_MODEL"
 ) 
    LANGUAGE RLANG 
SQL SECURITY INVOKER 
DEFAULT SCHEMA "PA"
AS
BEGIN

require(randomForest)
require(dplyr)
require(pmml)
# iris <- as.data.frame(data)
data(iris)
iris <- iris %>% mutate(y = factor(ifelse(Species == "setosa", 1, 0)))
model <- randomForest(y~Sepal.Length + Sepal.Width + Petal.Length + Petal.Width, iris,
         importance = TRUE,
         ntree = 500)
modelOut <- as.data.frame(pmml(model))

END;

(Please don’t be confused, that I’m not using my input data for model training, this is not a real example)

Here is how a table with the model on SAP HANA should look like:

model on SAP HANA

In this example training is working, but I’m not sure how to save the randomForest-Object on SAP HANA data base or how to convert the randomForest-Object into similar one in the picture.

Would appreciate any help :)

Sandra Rossi
  • 11,934
  • 5
  • 22
  • 48
KayEd
  • 3
  • 2

1 Answers1

0

If you plan to use R server for your predictions, you can store your random Forest model as a BLOB object in SAP HANA.

Following the SAP HANA R Integration Guide, you need to.

  1. Include a BLOB attribute to your table "PA"."TRAIN_MODEL.
  2. Store the model as binary with function serialize before writing it in your table.
  3. Load and Unserialize your model when calling predict procedure.

Which would give, in your R script.

require(randomForest)
require(dplyr)
require(pmml)
generateRobjColumn <- function(...){
        result <- as.data.frame(cbind(
            lapply(
                list(...),
                function(x) if (is.null(x)) NULL else serialize(x, NULL)
            )
        ))
        names(result) <- NULL
        names(result[[1]]) <- NULL
        result
    }
# iris <- as.data.frame(data)
data(iris)
iris <- iris %>% mutate(y = factor(ifelse(Species == "setosa", 1, 0)))
model <- randomForest(y~Sepal.Length + Sepal.Width + Petal.Length + Petal.Width, iris,
         importance = TRUE,
         ntree = 500)
modelOut <- data.frame(ID = 1, MODEL = generateRobjColumn(pmml(model)))   

Note that you don't actually need to use pmml if you plan to re-use the model as is.

In another procedure, you will need to call this table and unserialize your model for prediction.

CREATE PROCEDURE "PA"."RF_PREDICT" (IN data "PA"."IRIS", IN modelOut "PA"."TRAIN_MODEL", OUT result "PA"."PRED")
LANGUAGE RLANG AS
BEGIN
  rfModel <- unserialize(modelOut$MODEL[[1]])
  result <- predict(rfModel, newdata = data) # or whatever steps you need for prediction
END;
AshOfFire
  • 676
  • 5
  • 15
  • Thank you @AshOfFire, it helped me a lot. But I have one problem: when I'm applying the function generateRobjColumn my R Session get killed – KayEd Nov 15 '17 at 10:43
  • Have you tried running the script on a local R session or directly on the R server ? In both cases the script worked well for me. – AshOfFire Nov 15 '17 at 10:58
  • I've tried the local R session: with a small random forest like for iris data it works well, but for RF with my data it crashed everytime, even with `options(java.parameters = "-Xmx14g")` – KayEd Nov 15 '17 at 11:05
  • Maybe it's due to the size of your data (I assume, way larger than iris) ? Serialize seems to have a limitation of size - see [this topic](https://www.rdocumentation.org/packages/base/versions/3.4.1/topics/serialize) for more info. – AshOfFire Nov 15 '17 at 11:18