How to modify KStream key and values in Kafka word count program?

Question

I am new to Kafka Streams and kind of stuck in basic word count program. In the below program, I am trying to change the case of value but it's not working (val wordCountInputProcessed = wordCountInput.mapValues(value => value.toLowerCase)). Is there anything wrong in here?

kafka stream version => 2.3.0

Scala version => 2.11.8

import java.util._
import org.apache.kafka.streams.{KafkaStreams, StreamsConfig}
import org.apache.kafka.clients.consumer.ConsumerConfig
import org.apache.kafka.streams.{KafkaStreams,StreamsBuilder, StreamsConfig}
import org.apache.kafka.common.serialization.{StringDeserializer,LongDeserializer}

object WordCount {
  def main(args: Array[String]): Unit = {

    val config = new Properties()

    config.put(StreamsConfig.APPLICATION_ID_CONFIG,"word-count-example")
    config.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG,"localhost:9092")
    config.put(ConsumerConfig.AutoOffsetReset,"earliest")
    config.put(StreamsConfig.DEFAULT_KEY_SERDE_CLASS_CONFIG,classOf[StringDeserializer])
    config.put(StreamsConfig.DEFAULT_VALUE_SERDE_CLASS_CONFIG,classOf[StringDeserializer])



    val builder = new StreamsBuilder
    val wordCountInput = builder.stream[String,String]("streams-plaintext-input")

    val wordCountInputProcessed = wordCountInput.mapValues(value => value.toLowerCase)

    wordCountInputProcessed.to("streams-plaintext-output")

    val streams = new KafkaStreams(builder.build(),config)
    streams.start()
    println(streams.toString)

  }
}

Here's the snapshot of this issue.

Shouldn't it be String instead of Nothing ?

Can you provide the full program? It's unclear what you are exactly doing. It seem you don't take the result of `mapValues()` to do further processing on it. Note, that the input `wordCountInput` `KStream` is immutable, and every operation returns a new `KStream` object that you need to use for further processing. — Matthias J. Sax, Mar 22 '20 at 00:33
@MatthiasJ.Sax I just updated with the full program along with the error snapshot. — Goldie, Mar 22 '20 at 05:32

Tuyen Luong · Answer 1 · 2020-03-22T07:17:42.447

You have to re-assign your transformed KStream to the KStream var wordCountInput, otherwise the wordCountInput still got the initial KStream, something like this:

wordCountInput = wordCountInput.mapValues(value => value.toLowerCase)

Updated

I make some other changes and the application run just fine.

Kafka Streams using a SerDes class to wrap around StringSerializer/StringDeserializer, so change SERDES class config fromStringSerializer/StringDeserializertoSerdeString`:

config.put(StreamsConfig.DEFAULT_KEY_SERDE_CLASS_CONFIG,classOf[StringSerde])
config.put(StreamsConfig.DEFAULT_VALUE_SERDE_CLASS_CONFIG,classOf[StringSerde])

Additional tip, it's be easier if you put debug into your Stream DSL to check if whether you have receive new message or not, I usually debug like this:

val wordCountInputProcessed = wordCountInput
      .mapValues(value => {
        println("origin " + value)
        println("lowercase " + value.toLowerCase)
        value.toLowerCase
      })

You can also put debug inside the mapValues.

Update 1

Update the full application


import java.util.Properties

import org.apache.kafka.clients.consumer.ConsumerConfig
import org.apache.kafka.common.serialization.Serdes.StringSerde
import org.apache.kafka.streams.{KafkaStreams, StreamsBuilder, StreamsConfig}

object WordCount {
  def main(args: Array[String]): Unit = {

    val config = new Properties

    config.put(StreamsConfig.APPLICATION_ID_CONFIG,"word-count-example")
    config.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG,"localhost:9092")
    config.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG,"earliest")
    config.put(StreamsConfig.DEFAULT_KEY_SERDE_CLASS_CONFIG,classOf[StringSerde])
    config.put(StreamsConfig.DEFAULT_VALUE_SERDE_CLASS_CONFIG,classOf[StringSerde])



    val builder = new StreamsBuilder
    val wordCountInput = builder.stream[String,String]("streams-plaintext-input")

    val wordCountInputProcessed = wordCountInput
      .mapValues(value => {
        println("origin " + value)
        value.toLowerCase
      })

    wordCountInputProcessed.mapValues(value => {
      println("lowercase " + value)
      value
    })

    wordCountInputProcessed.to("streams-plaintext-output")

    val streams = new KafkaStreams(builder.build(),config)
    streams.start()
    println(streams.toString)

  }
}

Have you try it yet? Can you call another chaining `mapValues` method right after your first `mapValues` to print debug info about `value` or use can your can directly print debug in the convert-to-lowercase `mapValues`? — Tuyen Luong, Mar 22 '20 at 03:03
Yes, I tried it before but that doesn't seem to be the problem. I was actually trying to chain the complete transformations and action in one line. Anyways I have updated the question with the full program, pls have a look. — Goldie, Mar 22 '20 at 05:34
It still not working for me...which kafka version you were trying this example in? — Goldie, Mar 22 '20 at 07:09
I updated full application, I'm using Scala 2.13.1 and kafka 2.4.0, what version are you using? I add this sbt `libraryDependencies += "org.apache.kafka" %% "kafka-streams-scala" % "2.4.0"` — Tuyen Luong, Mar 22 '20 at 07:18
It's working fine now. I used the scala library for StreamBuilder instead of the standard one along with few tweaks for implicit conversion. Thanks for your time ! — Goldie, Mar 22 '20 at 11:08

score 3 · Accepted Answer · answered Mar 22 '20 at 17:21

I changed to Kafka streams DSL for scala APIs from Java and it solved the problem. I am also using following modules for respective reasons.

org.apache.kafka.streams.scala.ImplicitConversions: Module that brings into scope the implicit conversions between the Scala and Java classes.

org.apache.kafka.streams.scala.Serdes: Module that contains all primitive SerDes that can be imported as implicits and a helper to create custom SerDes.

Please refer to this documentation for more details (Topic: KAFKA STREAMS DSL FOR SCALA) => https://kafka.apache.org/20/documentation/streams/developer-guide/dsl-api.html#scala-dsl

import java.time.Duration
import java.util._
import org.apache.kafka.clients.consumer.ConsumerConfig
import org.apache.kafka.common.serialization.Serdes
import org.apache.kafka.streams.{KafkaStreams, StreamsConfig}
import org.apache.kafka.streams.scala.StreamsBuilder

// Import for Scala DSL
import org.apache.kafka.streams.scala.ImplicitConversions._
import org.apache.kafka.streams.scala.Serdes._

object WordCount {

  def main(args: Array[String]): Unit = {

    val config = new Properties()

    config.put(StreamsConfig.APPLICATION_ID_CONFIG,"word-count-example")
    config.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG,"localhost:9092")
    config.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG,"earliest")
    config.put(StreamsConfig.DEFAULT_KEY_SERDE_CLASS_CONFIG,classOf[Serdes.StringSerde])
    config.put(StreamsConfig.DEFAULT_VALUE_SERDE_CLASS_CONFIG,classOf[Serdes.LongSerde])

    val builder = new StreamsBuilder
    val wordCountInput = builder.stream[String,String]("streams-plaintext-input")

    val wordCountInputProcessed = wordCountInput.mapValues(value => value.toLowerCase())
        .flatMapValues(x=>x.split(" "))
        .selectKey((key,value) => value)
        .groupByKey
        .count

    wordCountInputProcessed.toStream.to("streams-plaintext-output")

    val streams = new KafkaStreams(builder.build(),config)
    streams.start()
    println(streams.toString)

    sys.ShutdownHookThread {
      streams.close(Duration.ofSeconds(10))
    }

  }
}

How to modify KStream key and values in Kafka word count program?

2 Answers2