0

Let's imagine you have a table with this definition:

CREATE TABLE public.positions
(
   id serial,
   latitude numeric(18,12),
   longitude numeric(18,12),
   updated_at timestamp without time zone
)

And you have 50,000 rows in such table. now for testing purposes you will run an update like this:

update positions
set  updated_at = now()
where latitude between 234.12 and 235.00;

that statement will update 1,000 rows from the 50,000 (in this specific dataset)

if you run such query in 30 different threads, MySQL innodb will succeed and postgres will fail with lots of deadlocks.

why?

SDReyes
  • 9,798
  • 16
  • 53
  • 92

1 Answers1

5

Plain old luck, I would say.

If thirty threads go ahead and want to update the same 1000 rows, they can either access these rows in the same order (in which case they will lock out each other and do it sequentially) or in different orders (in which case they will deadlock).

That's the same for InnoDB and PostgreSQL.

To analyze why things work out different in your case, you should start by comparing the execution plans. Maybe then you get a clue why PostgreSQL does not access all rows in the same order.

Lacking this information, I'd guess that you are experiencing a feature introduced in version 8.3 that speeds up concurrent sequential scans:

  • Concurrent large sequential scans can now share disk reads (Jeff Davis)

    This is accomplished by starting the new sequential scan in the middle of the table (where another sequential scan is already in-progress) and wrapping around to the beginning to finish. This can affect the order of returned rows in a query that does not specify ORDER BY. The synchronize_seqscans configuration parameter can be used to disable this if necessary.

Check if your PostgreSQL execution plan uses sequential scans and see if changing synchronize_seqscans avoids the deadlocks.

Laurenz Albe
  • 209,280
  • 17
  • 206
  • 263
  • you're right. you can solve this in postgres by using: UPDATE.... WHERE id in ( SELECT ... FROM ... WHERE ... order by id FOR UPDATE). – SDReyes Oct 09 '16 at 18:00
  • I think the *right* solution would be *not* to have 30 threads update the same rows. – Laurenz Albe Oct 10 '16 at 06:44
  • unless, we are stressing the DB. which we we are, because this is part of a benchmark! xD – SDReyes Oct 10 '16 at 15:50
  • nice idea: Check if your PostgreSQL execution plan uses sequential scans and see if changing synchronize_seqscans avoids the deadlocks. – SDReyes Oct 10 '16 at 15:51
  • @SDReyes If all threads update the same rows, that's not a good stress test, because all but one threads will do nothing (wait on a lock). Have them update *different* rows, then you'll get much higher load. – Laurenz Albe Oct 11 '16 at 06:36
  • what if I tell you MySQL somehow dealt with this test. while pg didn't. – SDReyes Oct 11 '16 at 17:22
  • You already said that. And I told you it is luck that one DBMS handles thinks one way and another differently. All I tried to point out to you with my comment is that your stress test seems to cause more locking than stress, both in MySQL and in PostgreSQL. – Laurenz Albe Oct 12 '16 at 06:45
  • oohhhh, yeah, I need to test that. thanks Laurenz. will be back with more data about that case – SDReyes Oct 15 '16 at 02:36