We would like to be able to use distributed tables where some nodes would have multiple tenants and other nodes may only have one or two (e.g. put a massive, high-traffic tenant by themselves, but group multiple small tenants together).
I see DISTRIBUTE BY options of HASH and MODULO, which I don't think fit this requirement. There are other CREATE TABLE options (specifically DISTRIBUTED and DISTSTYLE, shown below), but I can't seem to find documentation or details as to what these options mean. I saw a post referring to a custom distribution function, but I can't find any other references to it.
Questions: Is there a way to explicitly assign distribution column values to nodes using DISTRIBUTED or DISTSTYLE options or by other means? Are custom distribution functions available or on the roadmap? (Bonus question: Any links to details about DISTRIBUTED or DISTSTYLE?)
...
[
DISTRIBUTE BY { REPLICATION | ROUNDROBIN | { [HASH | MODULO ] ( column_name ) } } |
DISTRIBUTED { { BY ( column_name ) } | { RANDOMLY } |
DISTSTYLE { EVEN | KEY | ALL } DISTKEY ( column_name )
]