2

My understanding is I can filter data using stream and put it to specific topics.

Problem : The producer sends data with field country. Then stream processing filters these data and puts to topics by country code.

As result those consumers who are subscribed to specific countries(code) would get message.

Problem is it requires a lot of topics by count of countries. And in the feature I need to do the same with countries.

How to organize it in Kafka and filter data?

Robin Moffatt
  • 30,382
  • 3
  • 65
  • 92
  • Why would you need multiple topics? Can you give an example? – OneCricketeer Sep 29 '21 at 12:45
  • One topic contains orders for USA country, another for UK, and etc. –  Sep 29 '21 at 13:05
  • 2
    Sure, but why do those need to be topics instead of keys of records in partitions in one `country`topic? – OneCricketeer Sep 29 '21 at 13:06
  • Because on another side there are a consumers (web, mobile apps) which subscribe to specific countries from where they want to receive the orders. –  Sep 29 '21 at 13:08
  • 1
    You can have separate consumer groups and assign consumer instances to singular (or groups of) partitions that only store specific country codes – OneCricketeer Sep 29 '21 at 13:09
  • Either way, countries have changed over the course of history, and it's unclear to me how you account on adding new topics or partitions or remapping old data if/when this happens – OneCricketeer Sep 29 '21 at 13:12
  • Yes, there is a case when order appears with another country name. –  Sep 29 '21 at 13:19
  • Sure, but let's say Scotland declares independence from the UK and you've got previous data that's written with UK country code that should be removed and written to some new SCO country code... Just something to think about – OneCricketeer Sep 29 '21 at 13:21
  • Yes, you are right :) –  Sep 29 '21 at 13:27
  • Do you have any examples similar with my? –  Sep 29 '21 at 14:14
  • 1
    The below answer is "correct". Not sure what else you're looking for – OneCricketeer Sep 29 '21 at 15:04

1 Answers1

0

You have few options here :

Kafka Streaming : With kafka streaming you can filter data as per your need and write it to the new topics. Consumers can consume messages from those new topics.

Filter Data on the Consumer Side : You consume the data and filter the data as per required criteria on the consumer side.

Use Separate partitions for separate country code : You define total partitions of this topic as per the number of country codes and make country code as key. Now make your consumers direct to right partition for consuming country specific messages.

SRJ
  • 2,092
  • 3
  • 17
  • 36
  • Thank you, first point you wrote I use now. So I need the third within partitions by count of countries. I wonder how then psuh data from concumer (python script) to consumers(web apps, mobile apps)? –  Sep 29 '21 at 13:18
  • 1
    @Aurica Use Websockets or write the Kafka data to some type of database which your web/mobile clients will query via some API – OneCricketeer Sep 29 '21 at 15:05
  • Thank you, how to make a right decision what to use, websoket or db (pull). Where can I find use cases? –  Sep 29 '21 at 15:13
  • 1
    This might help you https://stackoverflow.com/questions/5792966/data-pull-vs-push-oop-approach, https://stackoverflow.com/questions/34706186/push-pull-mechanism-observer-pattern – SRJ Sep 29 '21 at 16:51