My company is developing a SaaS service to store events and provide analytics through dashboards.
Since we won't have deletes or updates, the idea is to create a columnar-based, OLAP architecture to benefit from compression and latency it provides, and PostgreSQL Citus is one platform we intend to evaluate.
The overall architecture is pretty standard: an API will receive the events and then store them on Kafka in JSON format. Then, those events will be sent to PostgreSQL. Some fields will be "jsonb" data type.
By reading the docs, the best practice is distribute tables by tenant id.
Just wanted to doucle-check a few things and would greatly appreciate someone's input:
- Does the architecture described above make sense? Is there anything we should change or pay attention to?
- Are there limitations in the number of nodes or shards that can be scaled out for this columnar approach?
- Is GIN index supported? (I believe it is, since it's not listed in 'Limitations')
Thanks!