How data node selection done in MySQL Multi-site clustering - Is it based on geographic proximity of data nodes in a node group

Question

I am exploring the option of deploying MySQL multi-site clustering. The MySQL cluster deployment has three sites and it has two node groups and each node group contains three data nodes with the NoOfReplicas=3. The 3 data nodes in a node group are placed in three different sites for geographic spread.

All three sites are accessing the MySQL cluster through the mysqld server and performs the SELECT and INSERT/UPDATE operations in a single table.

Question 1:

Which data node will be accessed when the queries are issued from a site. Will the query access the local data node sitting at the same site from where the query originates.

Question 2:

Using the EXPLAIN statement, I can understand which partition is being used in the query operation, but not the exact data node that gets accessed for the query. Is there any way to understand which data node in the node group accessed for the query.

Question 3:

Is there a way to set the site affinity / tagging for the data node selection in a node group.

First read : https://stackoverflow.com/help/how-to-ask then post question. — Ashish Detroja, Oct 23 '17 at 11:36

score 1 · Answer 1 · answered Oct 24 '17 at 12:48

Question 1: The answer depends on if the table is using the READ BACKUP feature or not. If not the query will almost always be sent to the primary replica independent of where it is.

For READ BACKUP in MySQL Server we will send the query to a data node in the same node group and on the same host. This will normally be automatic based on the same hostname being used for MySQL Server and data node. It is possible also to set a variable ndb_data_node_neighbour to be the data node that you are closest to (this is a config variable in the MySQL Server).

When the data node evaluates where to send the query to it will go to the local node if data resides there. Otherwise it will go to the primary replica node.

You can also use fully replicated tables in which case the data resides in each node and in this case it will always go to the a data node on the same host and it will find data there.

Question 2: The mapping of a partition to nodes and LDM threads is static. This information is available in the ndbinfo table table_fragments found at: https://dev.mysql.com/doc/mysql-cluster-excerpt/5.7/en/mysql-cluster-ndbinfo-table-fragments.html

Question 3: Interesting question, I have been working on such a feature quite recently. Whether it will be actually released and when is as usual not something that cannot be promised, but the idea is in line with your thoughts, one defines a LocationDomainId for each data node and MySQL Server and uses this to route read requests. Again it will only be applicable to tables that use the READ BACKUP feature or fully replicated tables.

Thanks for the reply. The definition given for FULLY_REPLICATED is “FULLY_REPLICATED controls whether the table is fully replicated, that is, whether each data node has a complete copy of the table”. In NDB cluster, each node contains all the partitions assigned to the node group either as primary partition or as the replica. Will Fully Replicated feature copy all the partitions that are assigned to the other node groups also to a data node. If this is the case, this deployment is equal to single node group with 3 nodes(for 3 replicas). Am I missing something here — Santhi, Oct 25 '17 at 09:52
FULLY_REPLICATED with one node group is the same as using READ_BACKUP. With two or more node groups the FULLY REPLICATED will be replicated on all node groups whereas READ_BACKUP only replicates rows within one node group as for normal tables. — Mikael Ronström, Nov 06 '17 at 09:15

How data node selection done in MySQL Multi-site clustering - Is it based on geographic proximity of data nodes in a node group

1 Answers1