5

I am trying to get a small example from Apache flink running in clojure, but right now I am stuck, because of the type hinting in clojure and some strange quirk in flink.

Here is my code:

(ns pipeline.core
 (:import
 (org.apache.flink.api.java ExecutionEnvironment)
 (org.apache.flink.api.common.functions FlatMapFunction)
 (org.apache.flink.api.java.tuple Tuple2)
 (org.apache.flink.util Collector)
 (java.lang String)))

(def flink-env (ExecutionEnvironment/createLocalEnvironment))

(def dataset (.fromElements flink-env (to-array ["please test me"])))

(defn tokenizer [] (reify FlatMapFunction
                 ( flatMap [this value collector] 
                   (println value))))

(.flatMap dataset (tokenizer))

If I do not provide type hints, I get an error from the flink api:

Caused by: java.lang.IllegalArgumentException: The types of the interface org.apache.flink.api.common.functions.FlatMapFunction could not be inferred. Support for synthetic interfaces, lambdas, and generic types is limited at this point.
at org.apache.flink.api.java.typeutils.TypeExtractor.getParameterType(TypeExtractor.java:662)

If I provide type hints:

(defn tokenizer [] (reify FlatMapFunction
                 ( ^void flatMap [this ^String value ^Collector collector] 
                   (println value))))

I get an error from the clojure compiler:

Caused by: java.lang.IllegalArgumentException: Can't find matching method: flatMap, leave off hints for auto match.
at clojure.lang.Compiler$NewInstanceMethod.parse(Compiler.java:8065) 

Is there a way to add type hints in clojure with generic classes? It should be something like this:

(defn tokenizer [] (reify FlatMapFunction
                 ( ^void flatMap [this ^String value ^Collector<Tuple2<String, Integer>> collector] 
                   (println value))))

But that doesn't work. Any ideas?

The lein config looks like this:

(defproject pipeline "0.1.0-SNAPSHOT"
 :description "FIXME: write description"
 :url "http://example.com/FIXME"
 :license {:name "Eclipse Public License"
        :url "http://www.eclipse.org/legal/epl-v10.html"}
 :dependencies [[org.clojure/clojure "1.7.0"]               
             [org.apache.flink/flink-java "0.9.0"]              
             ]
  :aot :all)
knuth
  • 75
  • 1
  • 4

2 Answers2

3

Clojure cannot handle reflections, thus you need to specify the return type manually via Flink method returns.

(.returns (.flatMap dataset (tokenizer)) String)

Furthermore, you need to use deftype to define tokenizer and instantiate a new object when using it because Flink cannot handle anonymous classes:

(deftype tokenizer [] FlatMapFunction
                      (flatMap [this value collector] 
                        (println value)))

(.flatMap dataset (tokenizer.))

Here is a full "Word-Count-Example" that can be packed into a jar and executed.

Pay attention to the type hints and casts. For tokenizer output (int 1) is required, otherwise Long would be the second type of Tuple2. Furthermore, we use a String to declare the output type for tokenizer (a class type is not sufficient because the reflection types must also be specified). Finally, we need to type hint (int-array [0]) to resolve the overload of groupBy (without it, the method is ambiguous to the Clojure compiler).

(ns org.apache.flink.flink-clojure.WordCount
 (:import
 (org.apache.flink.api.common.functions FlatMapFunction)
 (org.apache.flink.api.java DataSet)
 (org.apache.flink.api.java ExecutionEnvironment)
 (org.apache.flink.api.java.tuple Tuple2)
 (org.apache.flink.util Collector)
 (java.lang String))
 (:require [clojure.string :as str])
 (:gen-class))

(def flink-env (ExecutionEnvironment/createLocalEnvironment))

(def text (.fromElements flink-env (to-array ["please test me and me too"])))

(deftype tokenizer [] FlatMapFunction
                      (flatMap [this value collector]
                        (doseq [v (str/split value #"\s")]
                          (.collect collector (Tuple2. v (int 1))))))

(def tokens (.returns (.flatMap text (tokenizer.)) "Tuple2<String,Integer>"))

(def counts (.sum (.groupBy tokens (int-array [0])) 1))

(defn -main []
  (.print counts)
)
Matthias J. Sax
  • 59,682
  • 7
  • 117
  • 137
  • Hi, unfortunately that didn't help, I still get the exception for ` The types of the interface org.apache.flink.api.common.functions.FlatMapFunction could not be inferred.` But thanks for your input, I will look into SingleInputUdfOperator some more. By the way it should be ` (.returns (.flatMap dataset (tokenizer)) String)` – knuth Aug 21 '15 at 06:59
  • So, I tried the same in java directly. I modified the wordcount example, so that the Tokenizer only implements "FlatMapFunction" (without generics). Even using ".returns("Tuple2") I get the same exception. So perhaps I need to write an additional java wrapper for my purpose that handles the generic types. – knuth Aug 21 '15 at 07:20
  • Hi, I just had a closer look and what I suggested should work. It is a bug in the system. I just opened a JIRA for it: https://issues.apache.org/jira/browse/FLINK-2557 I guess, right now, you need to go the way with the additional Java wrapper. – Matthias J. Sax Aug 21 '15 at 11:35
  • Hello Matthias, thanks for your cooperation. I will try to get on with a java wrapper for now. – knuth Aug 21 '15 at 14:13
  • 1
    Hi, the bug in now fixed in the current master. I just updated my answer (including a full word-count example). – Matthias J. Sax Sep 21 '15 at 04:16
  • 1
    With the string type hint [deprecated in Flink 1.1](https://ci.apache.org/projects/flink/flink-docs-master/api/java/org/apache/flink/streaming/api/datastream/SingleOutputStreamOperator.html), @MatthiasJ.Sax, do you have any ideas on how to implement this using the alternative `TypeHint` method signatures? – frank Jul 17 '16 at 14:10
  • 1
    I guess not at all... Clojure cannot handle Java generics AFAIK. You would need to define a custom class `MyType extends Tuple2` and use this instead (you can omit `returns` at all, because for the custom "MyType" Flink will be able to determine the return type automatically.) – Matthias J. Sax Jul 17 '16 at 14:48
1

As a follow up to this comment Stuck with type hints in clojure for generic class

With the latest flink versions (tested on 1.6.1), you need to define a custom class otherwise you'll get an error like:

Exception in thread "main" java.lang.IllegalArgumentException: No matching method found: returns for class org.apache.flink.api.java.operators.FlatMapOperator, compiling:(WordCount.clj:69:13)

custom class:

package org.apache.flink.java;

import org.apache.flink.api.java.tuple.Tuple2;


public class WordCountTuple extends Tuple2<String, Integer> {

}

clojure code

(ns org.apache.flink.clojure.WordCount
  (:import
   (org.apache.flink.api.common.functions FlatMapFunction)
   (org.apache.flink.api.java DataSet)
   (org.apache.flink.api.java ExecutionEnvironment)
   (org.apache.flink.api.java.tuple Tuple2)
   (org.apache.flink.java WordCountTuple)
   (org.apache.flink.util Collector)
   (java.lang String))
  (:require [clojure.string :as str])
  (:gen-class))

(def flink-env (ExecutionEnvironment/getExecutionEnvironment))

(def text (.fromElements flink-env (to-array ["please test me and me too"])))

(deftype tokenizer [] FlatMapFunction
         (flatMap [this value collector]
           (doseq [v (str/split value #"\s")]
             (.collect collector (Tuple2. v (int 1))))))

(def tokens (.returns (.flatMap text (tokenizer.)) WordCountTuple))

(def counts (.sum (.groupBy tokens (int-array [0])) 1))

(defn -main []
  (.print counts))

working example fork here https://github.com/guillaume/flink-external