3

I have a situation involving a large point-of-sale / reservations system I run. Customers have running balances which are updated on a trigger when a purchase transaction occurs. That's to say that there are triggers on insert, update and delete on a table units which stores purchased units, and each of these triggers sums the customer's purchases and payments and updates a table called customer_balance. Keeping these balances updated without having to calculate them through joins constantly is essential. The business logic also dictates that sometimes multiple inserts or updates to units pertaining to a single customer can be sent asynchronously. When this happens it can lead to a deadlock where two triggers try to act on the same row in customer_balance. To prevent this, and since you can't put transactions in a trigger, there are a handful of very quick updates for which I call LOCK TABLES units WRITE, customer_balance WRITE, then do the purchase and let the trigger run, and then immediately UNLOCK. This occurs quickly and sporadically enough that it doesn't affect performance even during busy periods.

However. There are certain quarterly and annual reports that need to run extensive, very long SELECTs on these same tables, some as long as 30 seconds. When running the reports, I SET SESSION TRANSACTION ISOLATION LEVEL READ UNCOMMITTED. This seems to have helped mitigate a lot of other issues with the reports preventing implicit or row-level locks. But they still prevent the explicit LOCK TABLES calls. And so when these reports are run during peak hours, checkout transactions can freeze for the entire time the report is running.

My question is: Beyond setting isolation level to READ UNCOMMITTED, is there any way to prevent a long-running SELECT from blocking an explicit write lock?

Bill Karwin
  • 538,548
  • 86
  • 673
  • 828
joshstrike
  • 1,753
  • 11
  • 15

2 Answers2

2

The table locks acquired by LOCK TABLES are metadata locks. Any query holds a metadata lock on the table, and will block other sessions from doing LOCK TABLES, just as it blocks other DDL statements like ALTER TABLE, RENAME TABLE, TRUNCATE TABLE, or even a CREATE TRIGGER that references that table.

This has nothing to do with the READ UNCOMMITTED isolation level for the transaction. In 30 years of using SQL databases, I have never found a good use for READ UNCOMMITTED. I recommend avoiding that isolation level. It doesn't reduce locking requirements, and it only leads to phantom reads, because your query may read data in a state that has incomplete updates, or which may be rolled back.

To solve your long-running queries that block updates, the typical solution is to create a replica database instance where you run the long report queries. Even if the report queries block updates on the replica, that's okay, because it doesn't block them on the source instance. Replication is asynchronous, so once the report query is done, the updates resume and gradually catch up.

Another common solution is to use ETL to move the data to a data warehouse, where report queries over large datasets is more efficient anyway because of the way the data is stored.

Bill Karwin
  • 538,548
  • 86
  • 673
  • 828
  • Thanks. I've been trying to avoid replication just to run these reports, if possible; setting it up would just add to management workload and headache. Am I mistaken or doesn't READ UNCOMMITTED at least prevent the query from blocking other transactions from writing? – joshstrike Mar 28 '21 at 01:11
  • Nope. Really no. Any query or transaction against a table results in a shared metadata lock. Other sessions can also have shared metadata locks, but a session that requires an exclusive metadata lock has to wait until no session has any kind of metadata lock. – Bill Karwin Mar 28 '21 at 04:19
  • I would LOVE if you could expand on this for me. If the lowest isolation level read is ignoring ongoing transactions, just getting the dirtiest state as it passes each file, then why should it need to touch metadata or interfere with anything else? I understand how it could prevent a full table WRITE lock, and how it would lead to missing data... but how can it interfere with other reads or row-level transactions? Or, I mean, why does it have to? I'm trying to strip the raw, incomplete version of every row whether it's being written or not. Shouldn't that be a simple copy operation? – joshstrike Mar 28 '21 at 07:52
  • If you're querying a table, regardless of what isolation level you use, you don't want the table to be altered or renamed or dropped while you're querying it. So your session acquires a shared metadata lock to block those DDL operations. Concurrent sessions can all query the table at the same time because they don't block each other. They are _shared_ metadata locks. But any DDL statement needs an _exclusive_ metadata lock, meaning it has to wait until all sessions are done with the table. – Bill Karwin Mar 28 '21 at 08:26
  • You might like my presentation [InnoDB Locking Explained with Stick Figures](https://www.slideshare.net/billkarwin/innodb-locking-explained-with-stick-figures) – Bill Karwin Mar 28 '21 at 08:27
  • Hah! That's clever. Good information. One more question... does SELECT FOR UPDATE need to wait for a metadata lock? I'm wondering if running that in the trigger might prevent my deadlock situation and not have to wait on some report that's running huge table scans. – joshstrike Mar 28 '21 at 21:43
  • All queries have to wait for a metadata lock. For example, if there's an ALTER TABLE in progress and it's taking a long time, it will block queries. And vice-versa: long-running queries, whether they are locking queries or not, hold a metadata lock, and block DDL. – Bill Karwin Mar 28 '21 at 22:13
1
  1. Do not use LOCK TABLES with InnoDB. It's a sledgehammer. Sledgehammers are dangerous, as you are finding out.
  2. Test for deadlock. When one occurs, replay the entire transaction. The replay is likely to be faster than waiting for the unlock.
  3. Leave the transaction_isolation at the default (probably).
  4. Build and maintain summary tables. This way those 30+ second reports won't take nearly that long. http://mysql.rjweb.org/doc.php/summarytables
Rick James
  • 135,179
  • 13
  • 127
  • 222
  • I was aware it was a sledgehammer when I used it, and it was used judiciously. Rerunning the transaction doesn't fit the bill because we may have several disparate queries doing different things to the customer's balance at the same time; even a single rerun might just throw them back into deadlock. Summary type tables are obviously the go-to solution, but I'd like to find a way to do this off the original data without setting up harvesting cron processes. There are over 60 reports in this software and I don't relish relinking them all to replicas or summaries. – joshstrike Mar 28 '21 at 07:56
  • @joshstrike - A comment on "60 reports": A daily summary table can easily work for weekly, monthly, and yearly; for that matter, any arbitrary range of days. Also, I often find that 2 or 3 'reports' can be derived from a summary table that is designed with the 2-3 in mind. So, the 60 might shrink down to a dozen. – Rick James Mar 28 '21 at 15:13
  • you might be right for some of the more minor reports, but the report generation instructions as written are actually stored in the DB as sets of expected arguments and SQL statements to apply to the data. There's a lot of corporate desire for adding more and more reports, and no desire whatsoever to refactor something that "works". Also a lot of the data is from the same few tables but the reports are aggregating it in different ways with granular time-based rule sets, e.g., number of customers who stayed for more than 5,6, and 7 consecutive nights in the past 30, 60, and 90 days. – joshstrike Mar 28 '21 at 21:51
  • Well, the bosses can continue to use the slow mechanism that they have. Meanwhile, you refactor them as you see fit. Eventually, they will understand your worth. When I was working in that mode, I could sometimes whip out a new report in an hour -- if I already had a summary table that came close. – Rick James Mar 29 '21 at 03:37
  • @joshstrike - The 5,6,7 & 30, 60, 90 all come from a single table -- When someone leaves, compute how many consecutive nights he stayed. The table would contain (perhaps) these columns: Date_of departure, num_nights, count_of_users. Poof, 5,6,7, or anything else together with 30,60,90,7,365,whatever. You give them one page and have them plug in two numbers. If that was 9 of your 60, see how fast this might go? – Rick James Mar 29 '21 at 03:43
  • @joshstrike - And... What if tomorrow they want it broken down by which credit card was used? Add another column (various credit cards, cash, bitcoin, etc). Now the same (after modification and rebuild) handles this query, the original 9 queries, plus queries all payment methods. – Rick James Mar 29 '21 at 03:48
  • But, be cautious. A Rule of Thumb is to aim for a summary table to have about one-tenth the number of rows as the original. Else it might not be enough faster to be worth having. – Rick James Mar 29 '21 at 03:50