1

Is there any way to use a python function (Aggregate, Map etc.) within a Java Flink Job? I do not want to exploit SQL API. I wonder if only DataStream API can handle such functionality? Without this syntax:

tableEnv.executeSql("CREATE TEMPORARY SYSTEM FUNCTION FunctionName AS 'function.FunctionName' LANGUAGE PYTHON");

Much more like in Beam:

static void runWordCount(WordCountOptions options) {
  Pipeline p = Pipeline.create(options);

  p.apply("ReadLines", TextIO.read().from(options.getInputFile()))
      .apply(ParDo.of(new ExtractWordsFn()))
      .setRowSchema(ExtractWordsFn.SCHEMA)
      .apply(
          PythonExternalTransform.<PCollection<Row>, PCollection<Row>>from(
                  "apache_beam.dataframe.transforms.DataframeTransform",
                  options.getExpansionService())
              .withKwarg("func", PythonCallableSource.of("lambda df: df.groupby('word').sum()"))
              .withKwarg("include_indexes", true))
      .apply(MapElements.via(new FormatAsTextFn()))
      .apply("WriteCounts", TextIO.write().to(options.getOutput()));

  p.run().waitUntilFinish();
}

0 Answers0