0

I am working on Kafka Streams. I am facing the below issues:

Details about what I have done so far:

I created below topics, stream and tables:

./kafka-topics --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic bptcus

./kafka-topics --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic address-elasticsearch-sink

Created tables and stream for the above created topics.

CREATE table CUSTOMER_SRC  (customerId VARCHAR,name VARCHAR, age VARCHAR, address VARCHAR) WITH (KAFKA_TOPIC='bptcus', VALUE_FORMAT='JSON', KEY='customerId');

CREATE stream ADDRESS_SRC (addressId VARCHAR, city VARCHAR, state VARCHAR) WITH (KAFKA_TOPIC='address-elasticsearch-sink', VALUE_FORMAT='JSON');

I am able to see the data as below:

select * from customer_src;  
1528743137610 | Parent-1528743137047 | Ron | 31 | [{"addressId":"1","city":"Fremont","state":"CA"},{"addressId":"2","city":"Dallas","state":"TX"}] 

select * from address_src;  
1528743413826 | Parent-1528743137047 | 1 | Detroit | MI

Created another stream by joining the table and stream created above.

CREATE stream CUST_ADDR_SRC as select c.name , c.age , c.address, a.rowkey, a.addressId , a.city , a.state  from  ADDRESS_SRC a left join CUSTOMER_SRC c  on c.rowkey=a.rowkey;

I am able to see the data in the CUST_ADDR_SRC stream as below:

select * from cust_addr_src;  

1528743413826 | Parent-1528743137047 | Ron | 31 | [{"addressId":"1","city":"Fremont","state":"CA"},{"addressId":"2","city":"Dallas","state":"TX"}] | Parent-1528743137047 | 1 | Detroit | MI  

My Questions:

  1. Now I want to replace the addressId 1(Fremont) by the addressId 1(Detroit). How can I do that?
  2. I also tried to print the stream input out to console as mentioned in the ticket

Print Kafka Stream Input out to console?

Here is my code:

    Properties config = new Properties();
    config.put(StreamsConfig.APPLICATION_ID_CONFIG, "cusadd-application");
    config.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, "10.1.61.125:9092");
    config.put(StreamsConfig.ZOOKEEPER_CONNECT_CONFIG, "10.1.61.125:2181");
    config.put(StreamsConfig.KEY_SERDE_CLASS_CONFIG, Serdes.String().getClass());
    config.put(StreamsConfig.VALUE_SERDE_CLASS_CONFIG, Serdes.String().getClass());

    KStreamBuilder builder = new KStreamBuilder();
    KStream<String, String> source = builder.stream("cust_addr_src");
    source.foreach(new ForeachAction<String, String>() {
        public void apply(String key, String value) {
            System.out.println("Stream key values are: " + key + ": " + value);
        }
     });

I don't see the output.

Only, I can see the below output:

12:04:42.145 [StreamThread-1] DEBUG org.apache.kafka.clients.consumer.internals.Fetcher - Resetting offset for partition cust_addr_src-0 to latest offset. 12:04:42.145 [StreamThread-1] DEBUG org.apache.kafka.clients.NetworkClient - Initiating connection to node 0 at hsharma-mbp15.local:9092. 12:04:42.145 [StreamThread-1] DEBUG org.apache.kafka.common.metrics.Metrics - Added sensor with name node-0.bytes-sent 12:04:42.145 [StreamThread-1] DEBUG org.apache.kafka.common.metrics.Metrics - Added sensor with name node-0.bytes-received 12:04:42.145 [StreamThread-1] DEBUG org.apache.kafka.common.metrics.Metrics - Added sensor with name node-0.latency 12:04:42.145 [StreamThread-1] DEBUG org.apache.kafka.clients.NetworkClient - Completed connection to node 0 12:04:42.145 [StreamThread-1] DEBUG org.apache.kafka.clients.consumer.internals.Fetcher - Fetched offset 0 for partition cust_addr_src-0 12:04:42.676 [StreamThread-1] DEBUG org.apache.kafka.common.metrics.Metrics - Added sensor with name topic.cust_addr_src.bytes-fetched 12:04:42.680 [StreamThread-1] DEBUG org.apache.kafka.common.metrics.Metrics - Added sensor with name topic.cust_addr_src.records-fetched 12:04:45.150 [StreamThread-1] DEBUG org.apache.kafka.clients.consumer.internals.AbstractCoordinator - Received successful heartbeat response for group cusadd-application.

Thanks in advance.

Matthias J. Sax
  • 59,682
  • 7
  • 117
  • 137
  • Why is it tagged with [tag:apache-spark]? – Alper t. Turker Jun 11 '18 at 20:00
  • It's unclear why you mix Kafka Streams and KSQL? Do you want to have a KSQL or Kafka Streams solution? – Matthias J. Sax Jun 11 '18 at 20:55
  • @MatthiasJ.Sax, I have not decided yet to which one I have to go for as of now. Can you please provide both the solutions? – hithendra sharma Jun 11 '18 at 21:20
  • @MatthiasJ.Sax, Can you please also suggest which is the best solution( ksql or kafka stream) to implement? – hithendra sharma Jun 11 '18 at 23:10
  • 2
    KSQL creates Kafka Streams applications. If you can do your operation in KSQL, then you can use it. Otherwise, if you need more flexible code and programming, then Kafka Streams or use lower level consumer – OneCricketeer Jun 12 '18 at 00:08
  • @MatthiasJ.Sax, I created the stream CUST_ADDR_SRC with the ksql query that I had posted above. After creating the stream CUST_ADDR_SRC, I executed "show streams" command. ksql>show streams;Stream Name | Kafka Topic | Format CUST_ADDR_SRC | CUST_ADDR_SRC | JSON Our Consumer application should read from the CUST_ADDR_SRC topic(associated with CUST_ADDR_SRC stream) only if there is any address change and its published to ADDRESS_SRC topic. I don't know how to do this in kafka stream, that's why I started using ksql – hithendra sharma Jun 12 '18 at 01:07
  • @cricket_007, I have posted my requirement in my previous comment why i used ksql and kafka stream. Can you please suggest me how to read from the stream and replace the json object? – hithendra sharma Jun 12 '18 at 01:18
  • If all you want to do is change a value in one `KStream` into another one, that is what the [`.map()` function is for](https://kafka.apache.org/10/javadoc/org/apache/kafka/streams/kstream/KStream.html#map-org.apache.kafka.streams.kstream.KeyValueMapper-) – OneCricketeer Jun 12 '18 at 03:14
  • @cricket_007, thanks for the quick response. Can you please elaborate with code? – hithendra sharma Jun 12 '18 at 04:29
  • `builder.stream("cust_addr_src").map((k,v) -> ...)` – OneCricketeer Jun 12 '18 at 12:54
  • Thanks. Will try and let you know – hithendra sharma Jun 12 '18 at 17:04
  • I tried with KStream uppercased = stream.mapValues( new ValueMapper() { @Override public String apply(String s) { System.out.println("String is: "+s); return s.toUpperCase(); } }); But no use, still its not printing any string. I tried with mapValues as I need to apply transformation on value – hithendra sharma Jun 14 '18 at 00:03

1 Answers1

0

I see two approaches:

  1. String manipulation: the address column is currently a STRING containing a JSON object. You could just string manipulation functions to replace the bits you want. Though this seems hacky.
  2. Struct manipulation: switch your CREATE TABLE statement so that the address is a ARRAY<STRUCT<addressId STRING, city STRING, state>> type, rather than a string. You can then use the elements of the array and the fields of the struct to build the output e.g.
ARRAY[
  STRUCT(
    addressId := address[0]->addressId,
    city := address_src->city,
    state := address[0]->state
  ),
  ... same for second element
]

The above will create an array containing two structs, with the new city set.

Of course, this only works if there is always two elements in the array. If there are a variable amount then you'd need to use a long windowed CASE statement to do different things based on the size of the array. e.g.

CASE 
   WHEN ARRAY_LENGTH(address) = 1 
    THEN ARRAY[STRUCT(addressId := address[0]->addressId, city := address_src->city, state := address[0]->state)]
   WHEN ARRAY_LENGTH(address) = 2
     THEN ARRAY(... with two elements...)
   WHEN ARRAY_LENGTH(address) = 3
     THEN ARRAY(... with three elements...)
END

Etc.

Andrew Coates
  • 1,775
  • 1
  • 10
  • 16