12

I have a kafka streams application waiting for records to be published on topic user_activity. It will receive json data and depending on the value of against a key I want to push that stream into different topics.

This is my streams App code:

KStream<String, String> source_user_activity = builder.stream("user_activity");
        source_user_activity.flatMapValues(new ValueMapper<String, Iterable<String>>() {
            @Override
            public Iterable<String> apply(String value) {
                System.out.println("value: " +  value);
                ArrayList<String> keywords = new ArrayList<String>();
                try {
                    JSONObject send = new JSONObject();
                    JSONObject received = new JSONObject(value);

                    send.put("current_date", getCurrentDate().toString());
                    send.put("activity_time", received.get("CreationTime"));
                    send.put("user_id", received.get("UserId"));
                    send.put("operation_type", received.get("Operation"));
                    send.put("app_name", received.get("Workload"));
                    keywords.add(send.toString());
                    // apply regex to value and for each match add it to keywords

                } catch (Exception e) {
                    // TODO: handle exception
                    System.err.println("Unable to convert to json");
                    e.printStackTrace();
                }

                return keywords;
            }
        }).to("user_activity_by_date");

In this code, I want to check operation type and then depending on that I want to push the streams into the relevant topic.

How can I achieve this?

EDIT:

I have updated my code to this:

final StreamsBuilder builder = new StreamsBuilder();

KStream<String, String> source_o365_user_activity = builder.stream("o365_user_activity");
KStream<String, String>[] branches = source_o365_user_activity.branch( 
      (key, value) -> (value.contains("Operation\":\"SharingSet") && value.contains("ItemType\":\"File")),
      (key, value) -> (value.contains("Operation\":\"AddedToSecureLink") && value.contains("ItemType\":\"File")),
      (key, value) -> true
     );

branches[0].to("o365_sharing_set_by_date");
branches[1].to("o365_added_to_secure_link_by_date");
branches[2].to("o365_user_activity_by_date");
el323
  • 2,760
  • 10
  • 45
  • 80

3 Answers3

18

You can use branch method in order to split your stream. This method takes predicates for splitting the source stream into several streams.

The code below is taken from kafka-streams-examples:

KStream<String, OrderValue>[] forks = ordersWithTotals.branch(
    (id, orderValue) -> orderValue.getValue() >= FRAUD_LIMIT,
    (id, orderValue) -> orderValue.getValue() < FRAUD_LIMIT);

forks[0].mapValues(
    orderValue -> new OrderValidation(orderValue.getOrder().getId(), FRAUD_CHECK, FAIL))
    .to(ORDER_VALIDATIONS.name(), Produced
        .with(ORDER_VALIDATIONS.keySerde(), ORDER_VALIDATIONS.valueSerde()));

forks[1].mapValues(
    orderValue -> new OrderValidation(orderValue.getOrder().getId(), FRAUD_CHECK, PASS))
    .to(ORDER_VALIDATIONS.name(), Produced
  .with(ORDER_VALIDATIONS.keySerde(), ORDER_VALIDATIONS.valueSerde()));
deFreitas
  • 4,196
  • 2
  • 33
  • 43
codejitsu
  • 3,162
  • 2
  • 24
  • 38
  • What is `(id, orderValue`)? I was looking at Kafka Streams documentation and its something like `(key, value) -> predicate()`. But I have a Json object in the value and then in that Json object I have multiple keys and values. So how can I branch depending on that? – el323 Feb 24 '18 at 08:46
  • This is just an example. Take a look at this: https://kafka.apache.org/0100/javadoc/org/apache/kafka/streams/kstream/KStream.html#branch(org.apache.kafka.streams.kstream.Predicate...) Ok, I see. The number of predicates is static, so you are able to split the source stream into some predefined number of substreams I guess. If you need some dynamic splitting, you have to redesign the logic, I am afraid. Regarding your question: you can define a list of predicates like _hasField_(...) or something. Each of the predicates will check if the field in the json. – codejitsu Feb 24 '18 at 10:26
  • Let's say I want to send all the records to the third topic. Does branch allow this? – el323 Feb 26 '18 at 05:58
  • @EL323 yes, you just create a predicate evaluating to true for every record. – codejitsu Feb 26 '18 at 07:40
  • Like I have done in my EDIT? But its not getting other logs. Its only getting the logs for which the first two predicates evaluate false. – el323 Feb 26 '18 at 08:03
  • @EL323 Ah, I see. I think it's not possible with branch, because the first matching predicate will "catch" the record. I guess, all you need is a branch with two predicates like in your edit. In order to forward all your messages to the third topic I'd just use something like source_o365_user_activity.to(...) – codejitsu Feb 26 '18 at 09:34
  • @codejitsu can we make branches dynamic ? So that we can achieve n num of branches based on a list in db ? – Bilal Siddiqui Jun 03 '21 at 06:43
  • Since version 2.8 is `KStream.branch(Predicate)` deprecated. Instead `KStream.split().branch(Predicate)` should be used. – nanachimi Dec 13 '22 at 20:09
  • An example with the new version: https://developer.confluent.io/tutorials/split-a-stream-of-events-into-substreams/kstreams.html – nanachimi Dec 13 '22 at 21:05
6

The original KStream.branch method is inconvenient because of mixed arrays and generics, and because it forces one to use 'magic numbers' to extract the right branch from the result (see e.g. KAFKA-5488 issue). Starting from spring-kafka 2.2.4, KafkaStreamBrancher class is available. With it, more convenient branching is possible:

        
new KafkaStreamBrancher<String, String>()
    .branch((key, value) -> value.contains("A"), ks->ks.to("A"))
    .branch((key, value) -> value.contains("B"), ks->ks.to("B"))
    .defaultBranch(ks->ks.to("C"))
    .onTopOf(builder.stream("source"))
    //onTopOf returns the provided stream so we can continue with method chaining 
    //and do something more with the original stream

There is also KIP-418, so a there is also a chance that branching will be improved in Kafka itself in further releases.

kolya_metallist
  • 589
  • 9
  • 20
Ivan Ponomarev
  • 442
  • 5
  • 11
1

Another possibility is routing the event dynamically using a TopicNameExtractor:

https://www.confluent.io/blog/putting-events-in-their-place-with-dynamic-routing

you would need to have created the topics in advance though,

val outputTopic: TopicNameExtractor[String, String] = (_, value: String, _) => defineOutputTopic(value)

builder
  .stream[String, String](inputTopic)
  .to(outputTopic)

and defineOutputTopic can return one of a defined set of topics given the value (or key or record context for that matter). PD: sorry for the scala code, in the link there is a Java example.

Rafael
  • 572
  • 5
  • 9