2

I am facing an issue in joining a KStream with a GlobalKTable and would appreciate your help.

Given the two Kafka topics orders and customers:

Orders

"1"     {"ID":"1","Name":"Myorder1","CustID":"100"}

"2"     {"ID":"2","Name":"MyOrder2","CustID":"200"}

Customers

"100"   {"CustID":"100","CustName":"Customer1"}

"200"   {"CustID":"200","CustName":"Customer2"}

The requirement is to enrich the stream of orders with the customer name

"1"     {"ID":"1","Name":"Myorder1","CustID":"100","CustName":"Customer1"}

"2"     {"ID":"2","Name":"MyOrder2","CustID":"200","CustName":"Customer2"}}

I am trying the following:

  1. Build a KStream from the orders topic
  2. Build a GlobalKTable from the customers topic
  3. Build another stream which joins Orders and Customers (look up Order.CustID in the Customer table)
KStream<String, EnrichedOrder> enrichedstreams = orders.join(
    customers,
    new KeyValueMapper<String, Order, String>() {            
        @Override
        public String apply(String key, Order value) {
           return value.CustID;
        }
    },
    new ValueJoiner<Order,Customer, EnrichedOrder>() {
        @Override
        public EnrichedOrder apply(Order order, Customer customer) {
            EnrichedOrder eorder = new EnrichedOrder();
            eorder.CustID = order.CustID;
            eorder.CustName = customer.CustName;
            eorder.ID = order.ID;
            eorder.Name = order.Name;           
            return eorder;
        }
    }
);

But it’s not giving any result and does not throw any exception either.

When using a leftJoin, I am getting a NullPointer exception for Customer.

Please let me know in case you have faced a similar issue and suggest how to fix this.

deepak
  • 23
  • 5
  • Make sure that the `GlobalKTable` is entirely populated when the stream processing starts. It may be the case that orders are already processing while the customer table is still being populated. To avoid this, start the streams application and only then produce new order events. Also you may need to reset offsets when running your tests multiple times. – user152468 Jun 04 '19 at 06:38
  • is there any way to check if GlobalKtable is populated entirely? Like foreach loop in kstream. – deepak Jun 04 '19 at 07:43
  • @deepak, could you provide the code by which you create the GlobalKTable? – dmkvl Jun 04 '19 at 08:25
  • @deepak, make sure your messages (orders and customers) have keys. By the way, do you really need a GlobalKTable (and not KTable)? – dmkvl Jun 04 '19 at 08:28
  • @dmkvl , yes both have the keys . please see Orders and Customers message in post. – deepak Jun 04 '19 at 08:33
  • @dmkvl pls find the code as GlobalKTable customers = builder.globalTable(Customer",Consumed.with(Serdes.String(), customerSerde),Materialized.>as("my-state-store")); – deepak Jun 04 '19 at 08:36
  • @deepak, you can check if GlobalKtable is populated: ReadOnlyKeyValueStore kss = streams.store(GlobalKTable.queryableStoreName(), QueryableStoreTypes.keyValueStore()) – dmkvl Jun 04 '19 at 10:01
  • @deepak , in the copy/paste of your events, it looks like the keys are `"100"` and `"200"` - I mean **the double-quotes seem to be part of the strings**. (I do not know if it is the real format of your keys or if it is simply an issue in the copy/paste.) If this is the case, the join between `order.CustID` ( `100` ) and the customer key ( `"100"` ) will not work. – Val Bonn Jun 05 '19 at 07:29
  • @ValBonn for this example ,both keys and all values are string only. order.CustID also "100". – deepak Jun 06 '19 at 04:52
  • What I meant is your orderCustId looks like the String `100` and the keys look like the String `"100"`. In other words, you try to join `""100""` with `"100"`. In your copy paste, if your key with the String 100, I would have expected to see `100 {"CustID":"100",...` instead of `"100" {"CustID":"100",...`. To be sure of that, could you share the code that produces the events in the `customers` topic? – Val Bonn Jun 06 '19 at 06:34
  • Thanks @ValBonn. It was the issue. were using JDBC Source connector for pulling data from SQL server that somehow producing key as ""100"". Written a custom producer and pushed the right data to topics. It worked. Thanks again for your help!!! – deepak Jun 06 '19 at 10:55
  • cool. I will write an answer in this case :) – Val Bonn Jun 06 '19 at 22:18
  • 1
    @user152468 -- your initial concern about "loading" the table should not apply -- on startup, a `GlobalKTable` is bootstrapped to the end of the topic before any processing begins (this is a difference to `KTable`s that provide time-synchronized joining/processing while `GlobalKTable`s are not time-synchronized). – Matthias J. Sax Jun 09 '19 at 02:09

2 Answers2

4

Let's look carefully at the content of your copy-paste:

In the customers topic:

"100"   {"CustID":"100","CustName":"Customer1"}

You can notice the key is a String, and this String contains double-quotes: "100". Usually, the string keys are printed without the double-quotes. I would rather have expected to see:

 100    {"CustID":"100","CustName":"Customer1"}

In other words, the Java String representation of your key is ""100"" (or "\"100\"") and not "100" as we would expect.

On the other hand, the value in your orders topic is a Json {"ID":"1","Name":"Myorder1","CustID":"100"}, and the attribute CustID is a String, this time represented in Java "100".

When you join orders and customers, you try to match the orders CustID 100 with the Customer key "100". And this will fail because of the double-quotes in the key which are missing from the CustID.

Val Bonn
  • 1,129
  • 1
  • 13
  • 31
0

@deepak you may need to materialize your KTable

builder.table(customers, Materialized.as(customerStore));

Then stream the orders and build your join.

  • i am using .GlobalKTable customers = builder.globalTable(Customer",Consumed.with(Serdes.String(), customerSerde),Materialized.>as("my-state-store")) . Please suggest in case we need to use it differently. – deepak Jun 06 '19 at 04:54
  • Make sure that you really need a GlobalKTable, if you were to switch the key used in the Orders topic to "CustID" you can switch to using a KTable instead (this gives better performance). If both the topics have the same partitioning strategy and have the same key (in your case you can use "CustId"), then you can use a Ktable vs a GlobalKTable. Please refer to this answer by Matthias J. Sax, https://stackoverflow.com/questions/45975755/what-are-the-differences-between-ktable-vs-globalktable-and-leftjoin-vs-outerj – aishwarya kumar Jun 10 '19 at 13:26