1

The output on consumer console screenshot

The application (.java) file is as given below;

public class WordCountFinal {

    public static void main(String[] args) {

        StringSerializer stringSerializer = new StringSerializer();
        StringDeserializer stringDeserializer = new StringDeserializer();
        TimeWindowedSerializer<String> windowedSerializer = new TimeWindowedSerializer<>(stringSerializer);
        TimeWindowedDeserializer<String> windowedDeserializer = new TimeWindowedDeserializer<>(stringDeserializer);
        Serde<Windowed<String>> windowedSerde = Serdes.serdeFrom(windowedSerializer, windowedDeserializer);


        Properties streamsConfiguration = new Properties();
        streamsConfiguration.put(StreamsConfig.APPLICATION_ID_CONFIG, "rogue");
        streamsConfiguration.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, "ssc-vm-r.com:9092, ssc-vmr:9092, ssc-vm:9092");
        streamsConfiguration.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest");
        streamsConfiguration.put(StreamsConfig.DEFAULT_KEY_SERDE_CLASS_CONFIG, Serdes.String().getClass());
        streamsConfiguration.put(StreamsConfig.DEFAULT_VALUE_SERDE_CLASS_CONFIG, Serdes.Long().getClass());

        StreamsBuilder builder = new StreamsBuilder();
        KStream<String, String> wordcountinput = builder.stream("TextLinesTopic", Consumed.with(Serdes.String(), Serdes.String()));

        KGroupedStream<String, String> groupedStream = wordcountinput
                .flatMapValues(value -> Arrays.asList(value.toLowerCase().split("\\W+")))
                .map((key, word) -> new KeyValue<>(word, word))
                .groupByKey(Grouped.with(Serdes.String(), Serdes.String()));

        KTable<Windowed<String>, Long> aggregatedStream = groupedStream
                .windowedBy(TimeWindows.of(Duration.ofMinutes(2)))
                .count();

        aggregatedStream.toStream().to("tuesdaystopic", Produced.with(windowedSerde, Serdes.Long()));

        KafkaStreams streams = new KafkaStreams(builder.build(), streamsConfiguration);
        streams.start();

        Runtime.getRuntime().addShutdownHook(new Thread(streams::close));

    }

}

input to producer console is sentences or words. output should be like similar wordcount app, but after 2 minutes, suppose till now i have word count for 'qwerty' as 5. and after two mins i enter again qwerty in producer console, i should get outputted count as 1.

qwerty 3

qwerty 4

qwerty 5

abcd 1

after 2 mins and entering qwerty in prod. console

qwerty 1

Suprit
  • 15
  • 5

1 Answers1

1

Note that the type of the key of the result is Windowed<String> -- that's also why you use a TimeWindowedSerializer when writing the result stream to a topic via to() (you don't use a StringSerializer).

When you read the data with the console consumer, you specify StringDeserializer for the key though, however, the bytes in the key is not of type String and thus you get those unreadable characters and the types don't match.

You can either specify a different deserializer (ie, TimeWindowedDeserializer when using the console consumer, or you modify the key to type String before writing the result into the output topic. For example you could use:

aggregatedStream.toStream()
    // `k` is of type Windowed<String>
    // you can get the plain String key via `key()`
    .selectKey((k,v) -> k.key())
    .to(....)
Matthias J. Sax
  • 59,682
  • 7
  • 117
  • 137
  • Thank you Matthias! the conversion part worked. but for consumer console, timewindoweddeserializer should work actually; but doesn't. But thanks anyways! Helped a lot. – Suprit Oct 08 '20 at 05:31
  • How did you try to use `TimeWindowedDeserializer` and why did it not work for you? – Matthias J. Sax Oct 08 '20 at 06:14
  • i used this on consumer console ./bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic tuesdaystopic --from-beginning --formatter kafka.tools.DefaultMessageFormatter --property print.key=true --property print.value=true --property key.deserializer=org.apache.kafka.streams.kstream.TimeWindowedDesrializer --property value.deserializer=org.apache.kafka.common.serialization.LongDeserializer – Suprit Oct 08 '20 at 09:46
  • for other variations it couldn't find the class. i guess i'm going wrong somewhere. please correct me. – Suprit Oct 08 '20 at 09:49
  • And what was the problem with the command? Did it fail? Or what did it print? – Matthias J. Sax Oct 08 '20 at 18:58
  • no it didn't print, it gave error saying that class exception not found.... the org.apache.kafka...... – Suprit Oct 09 '20 at 05:42
  • If it does not find the class, it seem it's missing in the classpath -- note that `TimeWindowedSerde` ships with `kafka-stream` package, not `kafka-clients`. It should be possible to add the missing jar via `$CLASSPATH` env variable. – Matthias J. Sax Oct 09 '20 at 17:30