Build real-time Data Warehouse using Apache Kafka & Mysql

Question

We have a data warehouse on MySQL with dimensions, fact tables and some aggregate tables. I want to implement the same structure using Apache Kafka where Source would be another MySQL server. I want to address the below requirements

Add foreign keys to fact tables
Add aggregate tables
Handle schema changes

I read somewhere that dimension-fact table architecture cannot be created well using Kafka because star schema(dimensions and fact tables) should be updated in order(first load dimensions then fact tables). If a fact goes into fact table before its dimension in dim table then FK would be NULL.

How we can handle these problems using Kafka Streams(using Python)?

Is it possible to do this using only Source & Sink connector?

Kafka Streams isn't a Python library, but how far have you gotten so far with loading data into Kafka? — OneCricketeer, May 24 '22 at 14:26
I have loaded change events into Kafka through MySQL Source connector which reads binlogs to do this. The problem comes when we need to transform that transactional data into star-schema. — SHOAIB AHMED, May 25 '22 at 02:08
I've seen suggestions in the past that the data should be normalized before being able to use Kafka effectively. Otherwise, you'll have Kafka Streams / KsqlDB joins (again, not Python) running in multiple places doing the same normalization that could be handled at the database layer, potentially doubling the storage needs of your kafka cluster. So no, it's not possible using only Connect — OneCricketeer, May 25 '22 at 14:49

Build real-time Data Warehouse using Apache Kafka & Mysql

0 Answers0