0

We would like to be able to use distributed tables where some nodes would have multiple tenants and other nodes may only have one or two (e.g. put a massive, high-traffic tenant by themselves, but group multiple small tenants together).

I see DISTRIBUTE BY options of HASH and MODULO, which I don't think fit this requirement. There are other CREATE TABLE options (specifically DISTRIBUTED and DISTSTYLE, shown below), but I can't seem to find documentation or details as to what these options mean. I saw a post referring to a custom distribution function, but I can't find any other references to it.

Questions: Is there a way to explicitly assign distribution column values to nodes using DISTRIBUTED or DISTSTYLE options or by other means? Are custom distribution functions available or on the roadmap? (Bonus question: Any links to details about DISTRIBUTED or DISTSTYLE?)

...
[ 
  DISTRIBUTE BY { REPLICATION | ROUNDROBIN | { [HASH | MODULO ] ( column_name ) } } |
  DISTRIBUTED { { BY ( column_name ) } | { RANDOMLY } |
  DISTSTYLE { EVEN | KEY | ALL } DISTKEY ( column_name )
]
markvr4
  • 21
  • 5
  • If you partition your table by tenant and then assign the partitions to your node groups? So your tenant would be using a partition sitting on his datanode. – Balazs Gunics Nov 10 '18 at 13:48
  • What is the context for this need? That will make your life harder in a LOT of different ways and I can not see any benefit of it. It would also make it nearly impossible to modify or expand the cluster since Postgres-XL would not be controlling the placement of data. – BrianC Mar 22 '19 at 20:42

0 Answers0