0

I have got a lot of custom Dataframe transformations in my code. First group is simple casting:

dframe = dframe.withColumn("account_number", col("account").cast("decimal(38,0)"));

The second group is UDF-Transformations:

 (UDF1<Timestamp, Integer>) s -> s.toLocalDateTime().extractMonth()
 dframe = dframe.withColumn("month", callUDF(("monthExtractor"), dframe.col("trans_date_t")));

They are all working so the code is testing. But my final goal is to create ML Pipeline out of the code so I'd able to reuse . So is there a way to convert the code above into various Transformers?

Igor Kustov
  • 787
  • 1
  • 8
  • 21
  • Possible duplicate of [How to create a custom Transformer from a UDF?](http://stackoverflow.com/questions/35180527/how-to-create-a-custom-transformer-from-a-udf) –  Oct 25 '16 at 17:23
  • Found out an example: http://supunsetunga.blogspot.ru/2016/05/custom-transformers-for-spark.html – Igor Kustov Oct 26 '16 at 16:36

1 Answers1

0

You can create your own features transformation (with udf, or other method), and then override the transform method of spark, and put inside your own operation.

The spark code on github gives you some insight on this possibility to extend the transformer functionality provided you create the wrapper objects that are necessary.

 override def transform(dataset: Dataset[_]): DataFrame = {
        transformSchema(dataset.schema, logging = true)
        val xModel = new feature.XModel()
         val xOp = udf {xModel.transform _ }
         dataset.withColumn($(outputCol), xOp(col($(inputCol))))
      }

where xModel, and xOp are abstractions. The model will transform your dataset accordingly given the defined operation.

marilena.oita
  • 919
  • 8
  • 13