0

I'm a newbie of citus and need an advice that which column will be treated as distribution column when create_distributed_table.

Exp 1, the snapshot table of order, every row containing an unique order info, the 3 fields, order_id, create_date, update_date, which one is better.

Exp 2, if there're an table of user access log, such as clicks, sequence_id or click_date, which one?

Thanks!

1 Answers1

3

"Choosing the distribution column for each table is one of the most important modeling decisions because it determines how data is spread across nodes."

from Citus Docs.

I would suggest you to take a look at the related part of Citus Docs: https://docs.citusdata.com/en/v10.2/sharding/data_modeling.html

Also, you can try different distribution columns and do some performance tests on them, and then compare.