0

To preface, I'm a beginner to Apache Kafka - please forgive any obvious mistakes/misunderstandings in the way that I've posed this question.

Quick summary of what I'm trying to develop:

  • I'm developing an application that needs to continuously track the location of multiple users at once
  • This application then needs to be able to use KSQL DB to query the location of these users

What I need clarification on:

  • Since I can only really query one topic in KSQL DB (joins are really only limited to two topics in KSQL DB), the number of topics i could have to track users would only be one. I'm hoping to correspond each partition i make in this topic to the user-id. I may have 1000s of user-ids in this single topic.

I'm wondering what the performance issues of having thousands of partitions in this single topic might be and whether there is a better way to approach this problem?

Thank you!

  • I assume you are joining location data + id with full user-info? – OneCricketeer Aug 11 '21 at 15:56
  • Yes that would be correct! – Nikhilesh Belulkar Aug 11 '21 at 18:14
  • I don't know about KSQL, but I feel like the better approach might be to `map()` (e.g. using a UDF perhaps) the location+id data to "explode" the ID into a full user rather than keep it as a key... The only reason I can think of having ids as keys is if you needed ordering per-user, which seems unlikely since you probably only need ordering by time (which would be global over all partitions) – OneCricketeer Aug 11 '21 at 18:20
  • Newer versions of ksqlDB do support n-way joins... Why do you need a partition per user-id? -- In general, this blog post should help: https://www.confluent.io/blog/how-choose-number-topics-partitions-kafka-cluster/ – Matthias J. Sax Aug 11 '21 at 19:03
  • @MatthiasJ.Sax Thank you. I thought i would need a partition per user_id to ensure that one user_id would not overwrite another. Is this not the case? – Nikhilesh Belulkar Aug 24 '21 at 23:49
  • Not sure if I can follow. `user_id` sounds like the primary key, and you want to track the position per user: thus, don't you want to update the position per users (ie, "overwrite") -- why do you not want to overwrite? -- It's unclear from the original question. -- If you want to "update" per user-id, you would need to partition by user-id. -- Overall, it seems that the question is a little unclear. Maybe try the community chat instead of Stackoverflow: https://launchpass.com/confluentcommunity – Matthias J. Sax Aug 25 '21 at 05:34
  • 1
    @MatthiasJ.Sax appreciate the help thanks so much. That makes a lot of sense – Nikhilesh Belulkar Aug 30 '21 at 00:02

0 Answers0