5

I'd checked other similar kind of issues e.g. "deadlock in MySQL" in stack overflow but nothing leads to the solution.

REPLACE INTO db2.table2 (id, some_identifier_id, name, created_at, updated_at) (SELECT id, some_identifier_id, name, created_at, updated_at FROM db1.table1 WHERE some_identifier_id IS NOT NULL AND some_identifier_id NOT IN (SELECT some_identifier_id FROM db2.table1 WHERE some_other_identifier_id IS NOT NULL));

ERROR 1213 (40001): Deadlock found when trying to get lock; try restarting transaction

Situation:

  1. All the tables are InnoDB; db1.table1 => Collation: latin1_swedish_ci and for db2 => Collation: utf8_unicode_ci
  2. The query works fine in development server where version is Server version: 10.0.15-MariaDB
  3. Let's assume I have 5 DB servers which shares multi-master replication using Galera cluster.
  4. I'm executing the query manually in any of those 5 servers and getting the error.
  5. That server's version is same as the dev server where the query execution is successful i.e. 10.0.15-MariaDB

Tried:

  1. Included LOCK IN SHARE MODE e.g. REPLACE INTO... (1st select query (sub query) LOCK IN SHARE MODE); But it failed with the same message.
  2. Insert / REPLACE ... (1st select query (sub query LOCK IN SHARE MODE) LOCK IN SHARE MODE); it also failed with the same message.
  3. Tried with ordering by id in select query / sub select query. Again failed with same message.
  4. db1.table1 and db2.table1 both have almost only 50k records, so that shouldn't cause any issue I guess.
  5. All the tables have id as the primary key and auto increament. But I'm somehow using them explicitly- please observe the query.
  6. SHOW ENGINE INNODB STATUS; doesn't add up any useful hint to me.

The most probable reason could be due to multi-master replication behind the galera cluster for its optimistic locking (http://www.severalnines.com/blog/avoiding-deadlocks-galera-set-haproxy-single-node-writes-and-multi-node-reads). But that should not fail when executing the query on an individual node? Though on success I've to execute the same in that multi-master replication but I guess if the basic issue is solved then replicated servers won't create issue anymore.

Note:

I need to do this without any temp table or storing the sub query's result in code. There are some other dependencies for which executing a single query is the most favorable way till now.

1 Answers1

3

Okay, I found a workaround to this. As per my research and tests, I think there are 2 issues behind this failure.

  1. The replace into query is syncing id along with other required fields from db1.table1 to db2.table2. Insert/ Replace auto-incremental primary key is the most probable and obvious reason of deadlock in galera. I have removed id from that query and kept some_identifier_id as the unique key to support the same replace query. And it stopped that deadlock error almost.

Do not rely on auto-increment values to be sequential. Galera uses a mechanism based on autoincrement increment to produce unique non-conflicting sequences, so on every single node the sequence will have gaps. https://mariadb.com/kb/en/mariadb/mariadb-galera-cluster-known-limitations/

  1. But still the same deadlock message comes 1/10 times and that is a known behaviour of Galera. Galera uses optimistic locking; leads to deadlock rarely; retrying the transaction again is most suggested in that case.

Galera Cluster uses at the cluster-level optimistic concurrency control, which can result in transactions that issue a COMMIT aborting at that stage. http://galeracluster.com/documentation-webpages/limitations.html

In a gist- query was running successfully in an individual server but when it's galera then the failure comes. Removal of the auto-incremental primary key from that query and handling the same transaction to restart on deadlock solved the problem.

[Edit]

  1. I've written a post to explain the schema, environment, issue and how I worked with it. May be useful to someone facing the same issue.

  2. The issue is reported to community and open