3

I've got two Kafka Streams with keys in String and values in Avro format which I have created using KSQL.

Here's the first one:

DESCRIBE EXTENDED STREAM_1; 
Type                 : STREAM
Key field            : IDUSER
Timestamp field      : Not set - using <ROWTIME>
Key format           : STRING
Value format         : AVRO
Kafka output topic   : STREAM_1 (partitions: 4, replication: 1)

 Field                      | Type
--------------------------------------------------------
 ROWTIME                    | BIGINT           (system)
 ROWKEY                     | VARCHAR(STRING)  (system)
 FIRSTNAME                  | VARCHAR(STRING)
 LASTNAME                   | VARCHAR(STRING)
 IDUSER                     | VARCHAR(STRING)

and the second one:

DESCRIBE EXTENDED STREAM_2;
Type                 : STREAM
Key field            : IDUSER
Timestamp field      : Not set - using <ROWTIME>
Key format           : STRING
Value format         : AVRO
Kafka output topic   : STREAM_2 (partitions: 4, replication: 1)

 Field                      | Type
--------------------------------------------------------
 ROWTIME                    | BIGINT           (system)
 ROWKEY                     | VARCHAR(STRING)  (system)
 USERNAME                   | VARCHAR(STRING)
 IDUSER                     | VARCHAR(STRING)
 DEVICE                     | VARCHAR(STRING)

The desired output should include IDUSER, LASTNAME, DEVICE and USERNAME.

I want to left join these streams (on IDUSER) using Streams API and write the output into a kafka topic.

To do so, I've tried the following:

public static void main(String[] args) {

    final Properties streamsConfiguration = new Properties();

    streamsConfiguration.put(StreamsConfig.APPLICATION_ID_CONFIG, "kafka-strteams");
    streamsConfiguration.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092");
    streamsConfiguration.put(StreamsConfig.ZOOKEEPER_CONNECT_CONFIG, "localhost:2181");
    streamsConfiguration.put(AbstractKafkaAvroSerDeConfig.SCHEMA_REGISTRY_URL_CONFIG, "http://localhost:8081");

    streamsConfiguration.put(StreamsConfig.KEY_SERDE_CLASS_CONFIG, Serdes.String().getClass().getName());
    streamsConfiguration.put(StreamsConfig.VALUE_SERDE_CLASS_CONFIG, GenericAvroSerde.class);
    streamsConfiguration.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest");

    final Serde<String> stringSerde = Serdes.String();
    final Serde<GenericRecord> genericAvroSerde = new GenericAvroSerde();


    boolean isKeySerde = false;
    genericAvroSerde.configure(Collections.singletonMap(AbstractKafkaAvroSerDeConfig.SCHEMA_REGISTRY_URL_CONFIG, "http://localhost:8081"), isKeySerde);

    KStreamBuilder builder = new KStreamBuilder();

    KStream<String, GenericRecord> left = builder.stream("STREAM_1");
    KStream<String, GenericRecord> right = builder.stram("STREAM_2");

    // Java 8+ example, using lambda expressions
    KStream<String, GenericRecord> joined = left.leftJoin(right,
        (leftValue, rightValue) -> "left=" + leftValue + ", right=" + rightValue, /* ValueJoiner */
        JoinWindows.of(TimeUnit.MINUTES.toMillis(5)),
        Joined.with(
          stringSerde, /* key */
          genericAvroSerde,   /* left value */
          genericAvroSerde)  /* right value */
      );
    joined.to(stringSerde, genericAvroSerde, "streams-output-testing");

    KafkaStreams streams = new KafkaStreams(builder, streamsConfiguration);
    streams.cleanUp();
    streams.start();

    Runtime.getRuntime().addShutdownHook(new Thread(streams::close));
}

However,

KStream<String, GenericRecord> joined = ...

throws an error on my IDE:

incompatible types: inference variable VR has incompatible bounds

When I try to use a String Serde for both keys and values, it works but the data is not that readable from kafka-console-consumer. What I want to do is to produce the data in AVRO format in order to be able to read them off using kafka-avro-console-consumer.

Matthias J. Sax
  • 59,682
  • 7
  • 117
  • 137
Giorgos Myrianthous
  • 36,235
  • 20
  • 134
  • 156

2 Answers2

4

My first guess is that you are returning a String from the join operation, whereas your code expects a GenericRecord as the result:

KStream<String, GenericRecord> joined = left.leftJoin(right,
    (leftValue, rightValue) -> "left=" + leftValue + ", right=" + rightValue, ...)

Note how joined has type KStream<String, GenericRecord>, i.e. the value has type GenericRecord, but the join output is computed via "left=" + leftValue + ", right=" + rightValue, which has type String.

miguno
  • 14,498
  • 3
  • 47
  • 63
  • Thanks for your reply. Can you suggest a workaround? How can I instantiate a GenericRecord from a join so that in the very end a value in AVRO format is produced? – Giorgos Myrianthous May 07 '18 at 14:05
  • I have updated my question to include the actual fields for both streams in order to get a taste of what I am trying to achieve. – Giorgos Myrianthous May 07 '18 at 14:25
  • Take a look at the Avro API on how to instantiate a GenericRecord. This step is not specific to Kafka Streams. I'd suggest to create a separate question for this on SO if need be. – miguno May 07 '18 at 16:00
  • I'll have a look into avro API. A last question which will make my life easier when constructing the `GenericRecord`. What will `leftValue` and `rightValue` normally contain? In this particular case will `leftValue` contain `FIRSTNAME`, `LASTNAME` and `rightValue` `USERNAME` and `DEVICE`? – Giorgos Myrianthous May 07 '18 at 19:12
  • 1
    leftValue will be the GenericRecord pojo from the left side of the join (`STREAM_1`), rightValue will be the GenericRecord pojo from the right side (`STREAM_2`). – miguno May 08 '18 at 06:30
  • 1
    See the API of GenericRecord (in Avro) on how to access your fields. For example, you can do `leftValue.get("FIRSTNAME")`. – miguno May 08 '18 at 06:37
0

Instead of converting value into string you can directly return value. For example :

KStream joined = left.leftJoin(right,
(leftValue, rightValue) -> { return rightValue});
darshan
  • 21
  • 1