2

I have a large table in which I have to update several rows. I tried to use multiple threads to update different rows concurrently, but it seems MySql locks all the rows in the table instead of locking only those rows which are matched by 'where' clause. Hence the update is not concurrent and sometimes lead to lock timeout error.

Is there any way to update different rows concurrently?

I am using Django in backend to perform update operation. I can switch to Postgres, but will it help or will there still be locking problem?

Update: Added Code Snippet:

def process_calculate_training(base_dept, exch_dept):
    # First initialize training to None, this is not happening concurrently.
    Emp.objects.filter(exch_dept=exch_dept, base_dept=base_dept).update(training=None)
    # Compute training and store result
    pass


def start_calculating_training():
    thread_func_args = [['MECH', 'COE'], ['MECH', 'ECE'], ['MECH', 'ICE'], ['MECH', 'IT']]
    with ThreadPool(4) as p:
        p.starmap(process_calculate_training, thread_func_args)

Update: Executed 'show engine innodb status;'

Found that Django automatically set isolation level to 'READ-COMMITTED'. Hence MySql is locking only those rows which needs to be updated. But still the update is not happening concurrently as 2 threads seems to be on wait for the lock even if they have to update different rows.

I did show engine innodb status; during concurrent update operations. Here are the results:

TRANSACTIONS
------------
Trx id counter 2446659
Purge done for trx's n:o < 2446655 undo n:o < 0 state: running but idle
History list length 32
LIST OF TRANSACTIONS FOR EACH SESSION:
---TRANSACTION 421137001004688, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 421137001002848, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 421137001003768, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 2446658, ACTIVE 11 sec fetching rows
mysql tables in use 1, locked 1
LOCK WAIT 4601 lock struct(s), heap size 1024208, 7281 row lock(s), undo log entries 2165
MySQL thread id 427, OS thread handle 139661736408832, query id 651475 localhost root updating
UPDATE `sdk_emp` SET `training` = NULL WHERE (`sdk_emp`.`base_dept` = 'MECH' AND `sdk_emp`.`exch_dept` = 'COE')
------- TRX HAS BEEN WAITING 9 SEC FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 345 page no 5238 n bits 240 index PRIMARY of table `empdb`.`sdk_emp` trx id 2446658 lock_mode X locks rec but not gap waiting
Record lock, heap no 135
------------------
---TRANSACTION 2446657, ACTIVE 11 sec fetching rows
mysql tables in use 1, locked 1
13971 lock struct(s), heap size 2351312, 15525 row lock(s), undo log entries 6307
MySQL thread id 424, OS thread handle 139661737813760, query id 651473 localhost root updating
UPDATE `sdk_emp` SET `training` = NULL WHERE (`sdk_emp`.`base_dept` = 'MECH' AND `sdk_emp`.`exch_dept` = 'ECE')
---TRANSACTION 2446656, ACTIVE 11 sec fetching rows
mysql tables in use 1, locked 1
LOCK WAIT 879 lock struct(s), heap size 270544, 1985 row lock(s), undo log entries 411
MySQL thread id 426, OS thread handle 139661736810240, query id 651471 localhost root updating
UPDATE `sdk_emp` SET `training` = NULL WHERE (`sdk_emp`.`base_dept` = 'MECH' AND `sdk_emp`.`exch_dept` = 'ICE')
------- TRX HAS BEEN WAITING 10 SEC FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 345 page no 1078 n bits 240 index PRIMARY of table `empdb`.`sdk_emp` trx id 2446656 lock_mode X locks rec but not gap waiting
Record lock, heap no 125
------------------
---TRANSACTION 2446655, ACTIVE 11 sec fetching rows
mysql tables in use 1, locked 1
13984 lock struct(s), heap size 2318544, 15263 row lock(s), undo log entries 6308
MySQL thread id 425, OS thread handle 139661737010944, query id 651469 localhost root updating
UPDATE `sdk_emp` SET `training` = NULL WHERE (`sdk_emp`.`base_dept` = 'MECH' AND `sdk_emp`.`exch_dept` = 'IT')

And again running the command after some time:

TRANSACTIONS
------------
Trx id counter 2446659
Purge done for trx's n:o < 2446655 undo n:o < 0 state: running but idle
History list length 32
LIST OF TRANSACTIONS FOR EACH SESSION:
---TRANSACTION 421137001004688, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 421137001002848, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 421137001003768, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 2446658, ACTIVE 31 sec fetching rows
mysql tables in use 1, locked 1
LOCK WAIT 4601 lock struct(s), heap size 1024208, 7281 row lock(s), undo log entries 2165
MySQL thread id 427, OS thread handle 139661736408832, query id 651475 localhost root updating
UPDATE `sdk_emp` SET `training` = NULL WHERE (`sdk_emp`.`base_dept` = 'MECH' AND `sdk_emp`.`exch_dept` = 'COE')
------- TRX HAS BEEN WAITING 29 SEC FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 345 page no 5238 n bits 240 index PRIMARY of table `empdb`.`sdk_emp` trx id 2446658 lock_mode X locks rec but not gap waiting
Record lock, heap no 135
------------------
---TRANSACTION 2446657, ACTIVE 31 sec updating or deleting
mysql tables in use 1, locked 1
27466 lock struct(s), heap size 4120784, 25388 row lock(s), undo log entries 12691
MySQL thread id 424, OS thread handle 139661737813760, query id 651473 localhost root updating
UPDATE `sdk_emp` SET `training` = NULL WHERE (`sdk_emp`.`base_dept` = 'MECH' AND `sdk_emp`.`exch_dept` = 'ECE')
---TRANSACTION 2446656, ACTIVE 31 sec fetching rows
mysql tables in use 1, locked 1
LOCK WAIT 879 lock struct(s), heap size 270544, 1985 row lock(s), undo log entries 411
MySQL thread id 426, OS thread handle 139661736810240, query id 651471 localhost root updating
UPDATE `sdk_emp` SET `training` = NULL WHERE (`sdk_emp`.`base_dept` = 'MECH' AND `sdk_emp`.`exch_dept` = 'ICE')
------- TRX HAS BEEN WAITING 30 SEC FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 345 page no 1078 n bits 240 index PRIMARY of table `empdb`.`sdk_emp` trx id 2446656 lock_mode X locks rec but not gap waiting
Record lock, heap no 125
------------------
---TRANSACTION 2446655, ACTIVE 31 sec fetching rows
mysql tables in use 1, locked 1
27489 lock struct(s), heap size 4169936, 25817 row lock(s), undo log entries 12692
MySQL thread id 425, OS thread handle 139661737010944, query id 651469 localhost root updating
UPDATE `sdk_emp` SET `training` = NULL WHERE (`sdk_emp`.`base_dept` = 'MECH' AND `sdk_emp`.`exch_dept` = 'IT')

I am not able to understand why 2 threads are not able to get locks as they will be updating different rows? And what is the solution for it?

  • If your intent is to have all updates succeed or all updates fail, use transactions. Also, Django and Postgres are not alternatives; MySQL and Postgres are alternatives, and Django can be used with either. – rd_nielsen Jul 21 '18 at 12:32
  • @rd_nielsen I meant to ask if using Postgres instead of MySql will solve the problem or not. – Anshul Gupta Jul 21 '18 at 18:54
  • You can update multiple rows with a single query from django by using `QuerySet.update()` Can you show us your python code? It's very likely that it's possible to improve performance significantly by constructing the query in a different way in your django code. Or are you using raw sql to perform the update? – Håken Lid Jul 21 '18 at 18:56
  • Why would you use multiple threads here? RDBMS's such as mysql and postgresql have built in query optimization (and parallelism), which you should take advantage of instead. Your multithread approach might in fact make it harder for the database query optimizer to do its job. If you can do something with a single database query, that will most often be the most performant solution. – Håken Lid Jul 21 '18 at 19:07
  • @HåkenLid My multithread approach is a test simulation of what I will need in future. Basically there will be celery workers and an API. Whenever a user will hit the API, it will send an update query to Db. If multiple users hit the API, I would like to execute the update in parallel if updated rows are non overlapping. – Anshul Gupta Jul 21 '18 at 19:37
  • Ah. That makes more sense. But in any case, you should include the actual python code in the question. It will make it easier to understand what kind of operations you are doing. – Håken Lid Jul 21 '18 at 19:44
  • 1
    Apart from my use case, I would really like to understand why processes are getting stuck on locks when the rows needed to be updated are different? And concurrent updates on different rows is a very basic use case (For eg: different people updating their profile in Fb simultaneously). There should be some way to do it elegantly. – Anshul Gupta Jul 21 '18 at 19:45

1 Answers1

0
  1. Make sure your table is well indexed and your update query is not doing full table scan or full index scan. By this way, your statement may requires less lock. Update by primary key or unique key if possible.

Doing EXPLAIN on your statement will help you know your query's efficiency.

  1. MySQL default isolation level is REPEATABLE-READ and if you are using transaction, it will keep the row lock for rows it touched until the transaction ends, even the row doesn't match.

Change isolation level to READ-COMMITTED can solve this issue that the row lock will immediately released if not matching.

Jacob
  • 1,776
  • 14
  • 11
  • Thanks for the tip. Added index (updating by primary key is not possible in my case) and found that Django set isolation level to READ-COMMITTED by default. But the problem still persists. See my updates on question. – Anshul Gupta Jul 21 '18 at 18:56
  • See this line: mysql tables in use 1, locked 1. You have the table locked. It usually should be 0. – Jacob Jul 22 '18 at 03:36
  • What's your table's primary key? – Jacob Jul 22 '18 at 03:39
  • Primary key is 'id' field created by default by django. If the table was locked by each process, then why 2 processes were able to get locks and proceed smoothly? Only 1 of them should be able to proceed. – Anshul Gupta Jul 22 '18 at 04:51
  • @AnshulGupta this is the reason. Is there any index on the two columns: base_dept and exch_dept? – Jacob Jul 22 '18 at 05:42