7

I am reading through section 13.2 of the PostgreSQL Manual, but the textual descriptions found there are not enough clear, and lack examples.

For instance the following two paragraphs are not clear to whom is learning PostgreSQL:

INSERT with an ON CONFLICT DO UPDATE clause behaves similarly. In Read Committed mode, each row proposed for insertion will either insert or update. Unless there are unrelated errors, one of those two outcomes is guaranteed. If a conflict originates in another transaction whose effects are not yet visible to the INSERT, the UPDATE clause will affect that row, even though possibly no version of that row is conventionally visible to the command."

and

The Repeatable Read mode provides a rigorous guarantee that each transaction sees a completely stable view of the database. However, this view will not necessarily always be consistent with some serial (one at a time) execution of concurrent transactions of the same level. For example, even a read only transaction at this level may see a control record updated to show that a batch has been completed but not see one of the detail records which is logically part of the batch because it read an earlier revision of the control record.

Can someone give examples clarifying what is in these two paragraphs?

Does anyone know where can I find a formal description of the behavior of PostgreSQL isolation levels? I am looking for this because it is an advanced topic which I believe a formal description would help clarify how it works, and thus, help avoid concurrency bugs between transactions.

UPDATE: Another doubt I have is how does a serializable transaction is handled in terms of how the database machinery decides to commit or abort it, when it can run concurrently with other transactions at other isolation levels? Does the database decides on the result of the serializable transaction as if the other transactions were ran with serializable isolation too?

Thanks

UPDATE 2: So far the best I found regarding implementation details of isolation levels is the PostgreSQL Wiki Serializable Page.

mljrg
  • 4,430
  • 2
  • 36
  • 49
  • I'd like to know why there are two recommendations to close this question. A potential response seems to be very useful to grab a clear understanding of this so much important database programming topic. – mljrg Jun 06 '18 at 17:25
  • I think there are two problems with this question: firstly, an answer in the form of an offsite link might well break at some point in the future; and secondly, it's not clear exactly what extra detail would need to be in an answer - what is it specifically that you find unclear about the descriptions you found, and how would a "formal description" differ? – IMSoP Jun 06 '18 at 17:41
  • Note that by "problems" I don't mean it's not *useful*, but that it doesn't fit very well *on this site*. If you [edit] the question to focus on one of the two paragraphs (you can always ask a separate question for the other), and explain which parts you already understand, the request for an explanation and example might be better suited for this format. – IMSoP Jun 06 '18 at 17:46
  • @IMSoP I adapted my question as per your comments, and now it is more focused on the examples I would like to be explained better, preferably with accompanying SQL. The request for a formal description is now relegated to a second priority, though I believe there is no such description anywhere but in the code itself :-( . Please read my comment to "Laurenz Albe" for what I do not understand in the two paragraphs. Thanks – mljrg Jun 07 '18 at 10:54
  • I'm still confused by what you think a "formal description" would look like, or how it would fit into this site. The Isolation Levels themselves are standardised in the ISO SQL standard, so there will be a formal description there of the guarantees they have to make. The exact details of how Postgres implements those guarantees are, by definition, an implementation detail, so they're only ever going to be *documented*, based on the details of the current code. – IMSoP Jun 07 '18 at 11:05
  • Meanwhile, I suggested *narrowing down the question* to one example, and one concern; you seem to instead have added *additional* questions ("another doubt..."). Again, it's not that these questions aren't interesting, it's just that you could write a whole book on them (you've already decided that the whole page of Postgres manual you found isn't enough), so it just doesn't fit into the format of this site. – IMSoP Jun 07 '18 at 11:08
  • @IMSoP By formal description I mean one describing PostgreSQL's transaction handling algorithm(s), such as a high-level pseudo-code description. – mljrg Jun 07 '18 at 11:49
  • @IMSoP Since there is already one tentative answer, perhaps I will wait for its author to respond to my comments. I will accept his answer in case he, or someone else, succeeds clarifying the two paragraphs and my last question. If not, then I will consider creating separate questions to get more visibility, if that's what you recommend. – mljrg Jun 07 '18 at 11:53

2 Answers2

9
  • READ COMMITTED: each SQL statement takes a new snapshot of the database, so every statement will always see changes made by concurrent transactions in the mean time as soon as they are committed. No serialization errors can occur.

  • REPEATABLE READ: the first statement in the transaction takes a snapshot of the database that is retained for the whole transaction, so all statements see the same state of the database. Serialization errors can occur if you try to modify a row that has been modified by a concurrent transaction after your snapshot was taken. This isolation level is no more expensive than READ COMMITTED.

  • SERIALIZABLE: Any transaction that may lead to an outcome that is not consistent with some serial execution order of the transactions will be aborted with a serialization error. There can be false positives. This isolation level is more expensive than the others.

Answers to the specific questions:

  • INSERT ... ON CONFLICT in read committed isolation:

    If transaction 1 has inserted a row, but not yet committed, transaction 2 running INSERT ... ON CONFLICT will wait until transaction 1 has committed or rolled back and then update or insert as appropriate. No constraint violation can occur.

  • Batch jobs and REPEATABLE READ:

    This paragraph is dark; ignore it. It tries to illustrate that two concurrent repeatable read translations can produce a result that is not consistent with any serial execution.

    A better example might be two concurrent transactions that both read the same data and perform updates on them based on the read results. Each of these transactions cannot see the modifications of the other one.

    Look into the PostgreSQL Wiki under "serializable" for more detailed examples.

  • Update question:

    This question is not quite clear to me.

    Serializable transactions take special "SI" locks which track read and write access and survive a commit. They don't block other sessions, but are used to determine if there may be a conflict. Serializable isolation level only works properly if all concurrent transactions use the serializable isolation level.

Laurenz Albe
  • 209,280
  • 17
  • 206
  • 263
  • Nice resume (it cannot be shorter). But can you give examples of the cases described in the two paragraphs above? 1st paragraph: what are the possible "unrelated errors"? What example matches the "If a conflict originates ..." case? 2nd paragraph: can you clarify the example in "... even a read only transaction at this level may see ...", in which case I cannot see how what is described in the rest of the statement "but not see one of ..." can happen? If you can give SQL for your examples it would be much much clear. Thanks – mljrg Jun 07 '18 at 10:41
  • I appended another question at the end of my original question (the **UPDATE** paragraph). Thanks – mljrg Jun 07 '18 at 10:48
  • Ok, clearly explained. The intriguing thing is your last statement "Serializable isolation level only works properly if all concurrent transactions use the serializable isolation level." But what if they do not, what is the outcome of processing mixed isolation levels? Or must we choose only one isolation level for all an application's transactions, which seems very restrictive? That is my question in the appended UPDATE paragraph. – mljrg Jun 07 '18 at 16:54
  • And my answer is yes, you must use serializable for all transactions. – Laurenz Albe Jun 07 '18 at 17:18
  • But that can impact the throughput of the application, that's what everyone says ... If there was somewhere something explaining how transactions with different isolation levels interact, it would be much easier to comprehend the internal concurrency control mechanism of PostgreSQL, and so, choose the most relaxed isolation for each transaction, and thus, achieve greater throughput. – mljrg Jun 07 '18 at 17:35
  • Or, PostgreSQL is so fast, even when all transactions are serializable ... (but I am clueless about any benchmarks supporting this). – mljrg Jun 07 '18 at 17:36
  • `SERIALIZABLE` will be slower, but you would be well advised to benchmark the performance hit for your own workload. – Laurenz Albe Jun 07 '18 at 18:49
1

Regarding the question in UPDATE

Does the database decides on the result of the serializable transaction as if the other transactions were ran with serializable isolation too?"

the answer to is NO.

Serializability is only verified between concurrent transactions at serializable isolation level. For example, given two transactions T1 and T2 interleaved like this:

T1: begin
T1: set transaction isolation level read committed;
T1: update addresses set street = 'Sun street' where id = 1
T2: begin
T2: set transaction isolation level serializable;
T2: select street from addresses where id = 1
T2: update addresses set street = 'Sea street' where id = 2
T1: select street from addresses where id = 2
T1: commit
T2: commit

both T1 and T2 will commit. However, if T1 is set to serializable isolation then T2 will abort.

mljrg
  • 4,430
  • 2
  • 36
  • 49