250

What is the difference between non-repeatable read and phantom read?

I have read the Isolation (database systems) article from Wikipedia, but I have a few doubts. In the below example, what will happen: the non-repeatable read and phantom read?

####Transaction A

SELECT ID, USERNAME, accountno, amount FROM USERS WHERE ID=1

####OUTPUT:

1----MIKE------29019892---------5000

####Transaction B

UPDATE USERS SET amount=amount+5000 where ID=1 AND accountno=29019892;
COMMIT;

####Transaction A

SELECT ID, USERNAME, accountno, amount FROM USERS WHERE ID=1

Another doubt is, in the above example, which isolation level should be used? And why?

Super Kai - Kazuya Ito
  • 22,221
  • 10
  • 124
  • 129
user1357722
  • 7,088
  • 13
  • 34
  • 43

12 Answers12

252

From Wikipedia (which has great and detailed examples for this):

A non-repeatable read occurs, when during the course of a transaction, a row is retrieved twice and the values within the row differ between reads.

and

A phantom read occurs when, in the course of a transaction, two identical queries are executed, and the collection of rows returned by the second query is different from the first.

Simple examples:

  • User A runs the same query twice.
  • In between, User B runs a transaction and commits.
  • Non-repeatable read: The A row that user A has queried has a different value the second time.
  • Phantom read: All the rows in the query have the same value before and after, but different rows are being selected (because B has deleted or inserted some). Example: select sum(x) from table; will return a different result even if none of the affected rows themselves have been updated, if rows have been added or deleted.

In the above example,which isolation level to be used?

What isolation level you need depends on your application. There is a high cost to a "better" isolation level (such as reduced concurrency).

In your example, you won't have a phantom read, because you select only from a single row (identified by primary key). You can have non-repeatable reads, so if that is a problem, you may want to have an isolation level that prevents that. In Oracle, transaction A could also issue a SELECT FOR UPDATE, then transaction B cannot change the row until A is done.

dade
  • 3,340
  • 4
  • 32
  • 53
Thilo
  • 257,207
  • 101
  • 511
  • 656
  • 9
    I don't really understand the logic of such a syntax... A **NON-repeatable** read occurs when the read is **repeated** (and a different value obtained)??!... – serhio Apr 15 '14 at 09:30
  • 27
    @serhio "non-repeatable" refers to the fact that you can read a value once and get x as the result, and then read again and get y as the result, so you cannot repeat (non-repeatable) the same results from two separate queries of the same row, because that row value was updated in between reads. – BateTech Sep 09 '15 at 01:09
  • @Thilo Any real use case example where repeatable-read might create issues and where it is necessary ? – user104309 Aug 18 '17 at 21:13
  • What if the PK is modified in another transaction? Could that result in a phantom read? (A strange thing to do in most cases, but not impossible.) – jpmc26 May 13 '18 at 13:25
  • 2
    Both sounds same to me – sn.anurag May 23 '19 at 07:11
  • 8
    The difference is that when you do `count(*) from table` and get back first `42` and then `43` that is NOT a non-repeatable read, because for the 42 rows you selected the first time, you got back the same data the second time. So there was no row retrieved twice with different values. But it is still a phantom read, because you got back an additional row. So all the row values are the same individually, but you are now selecting different rows. @sn.anurag – Thilo May 23 '19 at 07:15
  • So is it possible that Non-repeatable read and Phantom read at the same time? – Spaceship222 Apr 21 '21 at 16:22
  • @Spaceship222 Yep, non-repeatable read and phantom read can occur in the same result set of a read. It has the same nuances as described by others here- one previously seen row has a different value somewhere in it. and there's another row that wasn't there before :) It's like i ask for PrimaryKey,StudentName in a Class. 1st there's (1,Noner). Later in the same transaction i read the students enrolled and see [ (1 , Nonrepeatable) , (2 , Phantom) ]. Noner underwent non-repeatability bc their name changed. Phantom shifted through a wall and appeared to have been there the whole time. – David Mann May 29 '21 at 01:18
  • 3
    The difference is that a non-repeatable read returns different values for *the same logical row*. (For example, if the primary key is employee_id, then a certain employee may have different salaries in the two results.) A phantom read returns two different sets of rows, but for every row that appears in both sets, the column values are the same. – Don Smith Oct 28 '21 at 17:22
200

A simple way I like to think about it is:

Both non-repeatable and phantom reads have to do with data modification operations from a different transaction, which were committed after your transaction began, and then read by your transaction.

Non-repeatable reads are when your transaction reads committed UPDATES from another transaction. The same row now has different values than it did when your transaction began.

Phantom reads are similar but when reading from committed INSERTS and/or DELETES from another transaction. There are new rows or rows that have disappeared since you began the transaction.

Dirty reads are similar to non-repeatable and phantom reads, but relate to reading UNCOMMITTED data, and occur when an UPDATE, INSERT, or DELETE from another transaction is read, and the other transaction has NOT yet committed the data. It is reading "in progress" data, which may not be complete, and may never actually be committed.

BateTech
  • 5,780
  • 3
  • 20
  • 31
  • I am amazed (or say I don't believe). Dirty read is: Reading a modified data (by another transaction) which actually is not committed or may never be committed. Where possibly this scenario could be used? (Its not I don't believe you - Actually I haven't grasped the concept properly) - Please do not mind (You might be able to understand newbie Frustration ) – PHP Avenger Sep 08 '15 at 21:57
  • 4
    It has to do with transaction isolation levels and concurrency. Using the default isolation level, you will not get dirty reads, and in most cases, you want to avoid dirty reads. There are isolation levels or query hints that will allow dirty reads, which in *some* cases is an acceptable trade off in order to achieve higher concurrency or is necessary due to an edge case, such as troubleshooting an in progress transaction from another connection. It is good that the idea of a dirty read doesn't pass the "smell test" for you, bc as a general rule, they should be avoided, but do have a purpose. – BateTech Sep 08 '15 at 23:47
  • What if a delete is followed by an insert, couldn't that be used to effectively achieve an update? – Kevin Wheeler Jul 27 '16 at 22:36
  • @KevinWheeler I'm not sure of the relevance to this post, but a delete followed by an insert is almost always much more overhead compared to an update. You have twice the statements that could be blocked or deadlocked, way more audit data (if tracking dml changes), FK constraint checks, etc. – BateTech Jul 29 '16 at 03:04
  • 1
    @PHPAvenger here is an use case for READ UNCOMMITTED isolation level: there is always a possibility to encounter a deadlock between a select and an update query (explained [here](http://sqlblog.com/blogs/alexander_kuznetsov/archive/2008/05/03/when-index-covering-prevents-deadlocks.aspx)). If the select query is too complex to create a covering index, in order to avoid deadlocks you will want to use a READ UNCOMMITED isolation level with the risk of encountering dirty reads, but how often do you rollback transactions to worry about those dirty reads not being permanent?! – petrica.martinescu Mar 24 '17 at 17:04
  • 1
    @petrica.martinescu the issues caused by dirty reads are NOT just about whether or not a transaction is rolled back. Dirty reads can return very inaccurate results depending on how data in pending transactions has been modified. Imagine a transaction that performs a series of several deletes, updates, and/or inserts. If you read the data in the middle of that transaction using "read uncommitted", it is incomplete. Snapshot isolation level (in SQL Server) is a much better alternative to read uncommitted. A valid use case for read uncommitted isolation level in a production system is rare IMO. – BateTech Mar 03 '18 at 13:09
  • 1
    @BateTech UPDATE or DELETE both can take place for Non-repeatable reads or it is only UPDATE? – Dipon Roy Apr 23 '18 at 06:52
  • 2
    @DiponRoy great question. The locking implemented if using repeatable read (RR) isolation should prevent deletes from occurring on rows that have been selected. I've seen varying definitions of the 2 iso levels over the years, mainly saying phantom is a change in the collection/# rows returned and RR is the same row being changed. I just checked the updated MS SQL documentation says that deletes can cause non-RR (https://docs.microsoft.com/en-us/sql/odbc/reference/develop-app/transaction-isolation-levels?view=sql-server-2017 ) so I think it would be safe to group deletes in the RR category too – BateTech Apr 26 '18 at 19:08
  • You said: "*Dirty reads are similar to non-repeatable and phantom reads, but relate to reading UNCOMMITTED data, and occur when an UPDATE, INSERT, or DELETE from another transaction is read, and the other transaction has NOT yet committed the data.*" Are you sure UPDATE, INSERT, or DELETE are included in dirty write? I believe it should be just UPDATE. – MsA Mar 24 '19 at 11:22
  • 2
    @anir yes inserts and deletes are included in dirty reads. Example: start a transaction, insert 2 of 100 invoice lines on connection a, now connection b reads those 2 lines before the trx is committed and before the other 98 lines are added, and so doesn't include all info for the invoice. This would be a dirty read involving an insert. – BateTech Mar 24 '19 at 11:33
  • Earlier I had a doubt whether it can be just UPDATE or all INSERT, UPDATE and DELETE in case of dirty read. However, now I believe, it can be all, as the definition says "... modified by uncommitted transaction". So reading any effect due to uncommitted transaction can lead to dirty read and that effect can be due to any of INSERT, UPDATE and DELETE, (**Q1**) right? Now I have new question, does that need to involve single data item / row or set of them? Again, I believe it can anything as long as it follows the definition "reading effect caused by uncommitted transation". (**Q2**) Right? – MsA Mar 24 '19 at 12:13
  • [..continued from previous comment] specifying two definitions for reference: (1) (From [wiki](https://en.wikipedia.org/wiki/Isolation_(database_systems)#Dirty_reads)) A dirty read (aka uncommitted dependency) occurs when a transaction is allowed to read data from a row that has been modified by another running transaction and not yet committed. (2) (from [book](https://www.amazon.com/Database-Concepts-Abraham-Silberschatz-Professor/dp/0073523321)) The transaction reads values written by another transaction that hasn’t committed yet. Now can you answer my questions from previous comment? – MsA Mar 24 '19 at 12:15
  • 1
    @anir Q1 the impact to other transactions depends on the transaction scope and isolation level that are used. Q2 yes it can involve a single data item, row, or set of rows, and can also involve indexes. – BateTech Mar 24 '19 at 12:47
94

The Non-Repeatable Read anomaly looks as follows:

Non-Repeatable Read

  1. Alice and Bob start two database transactions.
  2. Bob’s reads the post record and title column value is Transactions.
  3. Alice modifies the title of a given post record to the value of ACID.
  4. Alice commits her database transaction.
  5. If Bob’s re-reads the post record, he will observe a different version of this table row.

The Phantom Read anomaly can happen as follows:

Phantom Read

  1. Alice and Bob start two database transactions.
  2. Bob’s reads all the post_comment records associated with the post row with the identifier value of 1.
  3. Alice adds a new post_comment record which is associated with the post row having the identifier value of 1.
  4. Alice commits her database transaction.
  5. If Bob’s re-reads the post_comment records having the post_id column value equal to 1, he will observe a different version of this result set.

So, while the Non-Repeatable Read applies to a single row, the Phantom Read is about a range of records which satisfy a given query filtering criteria.

Vlad Mihalcea
  • 142,745
  • 71
  • 566
  • 911
  • Can `Phantom Read` contian multiple `non-repeatable reads`? – Vivere Feb 14 '22 at 12:35
  • 3
    There's no inclusion operation between these anomalies. The former is about range scans while the latter is about individual records. – Vlad Mihalcea Feb 14 '22 at 14:04
  • Would non repeatable read not cause the lost update problem when Bob tries to update the value based on his last read value? – Ankush Apr 25 '22 at 14:14
  • Is it safe to say Repeated read will never happen if a concurrent transaction executes INSERT/READ? it such a case will it leads to Phantom read? am I right? – Govinda Sakhare Jul 31 '22 at 18:06
  • Repeatable Read is about query stability for single-record result sets while Phantom Read is for the stability of a multi-record result set. So, update/delete interferes with RR while insert/update/delete may lead to PR. – Vlad Mihalcea Jul 31 '22 at 18:41
76

Read phenomena

  • Dirty reads: read UNCOMMITED data from another transaction
  • Non-repeatable reads: read COMMITTED data from an UPDATE query from another transaction
  • Phantom reads: read COMMITTED data from an INSERT or DELETE query from another transaction

Note : DELETE statements from another transaction, also have a very low probability of causing Non-repeatable reads in certain cases. It happens when the DELETE statement unfortunately, removes the very same row which your current transaction was querying. But this is a rare case, and far more unlikely to occur in a database which have millions of rows in each table. Tables containing transaction data usually have high data volume in any production environment.

Also we may observe that UPDATES may be a more frequent job in most use cases rather than actual INSERT or DELETES (in such cases, danger of non-repeatable reads remain only - phantom reads are not possible in those cases). This is why UPDATES are treated differently from INSERT-DELETE and the resulting anomaly is also named differently.

There is also an additional processing cost associated with handling for INSERT-DELETEs, rather than just handling the UPDATES.


Benefits of different isolation levels

  • READ_UNCOMMITTED prevents nothing. It's the zero isolation level
  • READ_COMMITTED prevents just one, i.e. Dirty reads
  • REPEATABLE_READ prevents two anomalies: Dirty reads and Non-repeatable reads
  • SERIALIZABLE prevents all three anomalies: Dirty reads, Non-repeatable reads and Phantom reads

Then why not just set the transaction SERIALIZABLE at all times? Well, the answer to the above question is: SERIALIZABLE setting makes transactions very slow, which we again don't want.

In fact transaction time consumption is in the following rate:

SERIALIZABLE > REPEATABLE_READ > READ_COMMITTED > READ_UNCOMMITTED

So READ_UNCOMMITTED setting is the fastest.


Summary

Actually we need to analyze the use case and decide an isolation level so that we optimize the transaction time and also prevent most anomalies.

Note that databases by default may have REPEATABLE_READ setting. Admins and architects may have an affinity towards choosing this setting as default, to exhibit better performance of the platform.

Subhadeep Ray
  • 909
  • 6
  • 6
  • 1
    UPDATE or DELETE both can take place for Non-repeatable reads or it is only UPDATE? – Dipon Roy Apr 23 '18 at 06:53
  • 2
    UPDATE or DELETE both can take place for Non-repeatable reads – niket patel Sep 28 '18 at 06:10
  • Actually we can summarize that on an average a random DELETE statement executed by another transaction on the same database has very low probability of causing non-repeatable reads for the current transaction. But the same delete statement has 100% chance of causing a Phantom read for the current transaction. Looking it that way, my writing is a bit wrong if you take it word for word. But hey, I intentionally wrote it this way to make things more clear to the reader. – Subhadeep Ray Apr 11 '19 at 08:40
  • +1 for a simple and easy to understand explanation. However I think most databases ( oracle , mysql ) have a default isolation level of Read Committed and probably postgress uses default of repeatable_read – akila Nov 12 '19 at 23:44
  • @akila - I am lying. ;-) Like I have already mentioned. :-) I am mentioning the boundary case. – Subhadeep Ray Sep 27 '20 at 01:30
  • @SubhadeepRay I am facing a situation where this is the flow `Transaction 1 -> Read integer from DB` `Transaction 2 -> Read integer from DB` `Transaction 1 -> Increment, Write and Commit to DB` `Transaction 2 -> Increment, Write and Commit to DB` Now this flow causes a missed increment because both transactions read the same integer value . So is there a way to protect against this flow which doesn't seem to fall under any of the 3 categories of dirty read, non repeatable read or a phantom read ? – abhaybhatia Apr 24 '21 at 02:24
12

There is a difference in the implementation between these two kinds isolation levels.
For "non-repeatable read", row-locking is needed.
For "phantom read",scoped-locking is needed, even a table-locking.
We can implement these two levels by using two-phase-locking protocol.

egraldlo
  • 183
  • 2
  • 11
  • To implement repeatable read or serializable, there is no need to use row-locking. –  Aug 02 '17 at 20:53
6

In a system with non-repeatable reads, the result of Transaction A's second query will reflect the update in Transaction B - it will see the new amount.

In a system that allows phantom reads, if Transaction B were to insert a new row with ID = 1, Transaction A will see the new row when the second query is executed; i.e. phantom reads are a special case of non-repeatable read.

Jeffrey Kemp
  • 59,135
  • 14
  • 106
  • 158
  • I don't think the explanation of a phantom read is correct. You can get phantom reads even if non-commit data is never visible. See the example on Wikipedia (linked in the comments above). – Thilo Jun 15 '12 at 05:14
6
  • Non-repeatable read(fuzzy read) is that a transaction reads the same row at least twice but the same row's data is different between the 1st and 2nd reads because other transactions update the same row's data and commit at the same time(concurrently).

  • Phantom read is that a transaction reads the same table at least twice but the number of the same table's rows is different between the 1st and 2nd reads because other transactions insert or delete rows and commit at the same time(concurrently).

I experimented non-repeatable read and phantom read with MySQL and 2 command prompts.

For the experiments of non-repeatable read and phantom read, I set READ COMMITTED isolation level to occur non-repeatable read and phantom read:

SET GLOBAL TRANSACTION ISOLATION LEVEL READ COMMITTED;
SET SESSION TRANSACTION ISOLATION LEVEL READ COMMITTED;

And, I created person table with id and name as shown below.

person table:

id name
1 John
2 David

First for non-repeatable read, I did these steps below with MySQL queries:

Flow Transaction 1 (T1) Transaction 2 (T2) Explanation
Step 1 BEGIN; T1 starts.
Step 2 BEGIN; T2 starts.
Step 3 SELECT * FROM person WHERE id = 2;

2 David
T1 reads David.
Step 4 UPDATE person SET name = 'Tom' WHERE id = 2; T2 updates David to Tom.
Step 5 COMMIT; T2 commits.
Step 6 SELECT * FROM person WHERE id = 2;

2 Tom
T1 reads Tom instead of David after T2 commits.

*Non-repeatable read occurs!!

Step 7 COMMIT; T1 commits.

Second for phantom read, I did these steps below with MySQL queries:

Flow Transaction 1 (T1) Transaction 2 (T2) Explanation
Step 1 BEGIN; T1 starts.
Step 2 BEGIN; T2 starts.
Step 3 SELECT * FROM person;

1 John
2 David
T1 reads 2 rows.
Step 4 INSERT INTO person VALUES (3, 'Tom'); T2 inserts the row with 3 and Tom to person table.
Step 5 COMMIT; T2 commits.
Step 6 SELECT * FROM person;

1 John
2 David
3 Tom
T1 reads 3 rows instead of 2 rows after T2 commits.

*Phantom read occurs!!

Step 7 COMMIT; T1 commits.
Henry Ecker
  • 34,399
  • 18
  • 41
  • 57
Super Kai - Kazuya Ito
  • 22,221
  • 10
  • 124
  • 129
  • Obviously, with inserts and updates, it's a different story, but is there a reason when non-repeatable reads would be an issue? In my mind, the "true" state is always reflected across N reads and that's good. – eja Mar 12 '23 at 08:25
  • i am able to reproduce phantom reads at READ COMMITTED isolation level, but not reproducing at REPEATABLE READ isolation level – nkalra0123 Apr 23 '23 at 07:54
2

The accepted answer indicates most of all that the so-called distinction between the two is actually not significant at all.

If "a row is retrieved twice and the values within the row differ between reads", then they are not the same row (not the same tuple in correct RDB speak) and it is then indeed by definition also the case that "the collection of rows returned by the second query is different from the first".

As to the question "which isolation level should be used", the more your data is of vital importance to someone, somewhere, the more it will be the case that Serializable is your only reasonable option.

Erwin Smout
  • 18,113
  • 4
  • 33
  • 52
1

Lets say you have a table named Employee. With columns: ID, Name You have the following data in the table

ID   | Name
1    | Mike 
2    | Peter
  1. You start a transaction.
  2. You select all the records of Table: Employee
  3. You lookup the name of the record with ID=1, the value is Mike
  4. You lookup the number of records in the employee table. the value is 2.
  5. In the same transction, now you again select all the records of the employee table and lookup the name attribute of the record with id=1, and lookup the number of records in the employee table. In Non-repeatable read scenario you cannot gurantee that the name value will be Mike. In Phantom read scenario you cannot gurantee number of employees will be 2 it might be more that 2.

Why? Because between step 4 and 5 some other database session might have modified the name value or added a new record to the table.

Now lets see how Mysql solves these problem with different database isolation level. In this post lets only consider the isolation levels: REPEATABLE_READ and SERIALIZABLE levels.

When database has REPEATABLE_READ isolation level. in step 2, mysql add a read lock to the selected rows. Note that this is not a table level lock so other sessions can still add new records to the table and commit but other sessions cannot delete or update the selected rows since these we have the read lock for the selected items. Even after the select query finish the read lock will remain active until we commit. because of this once you read certain rows in a transaction you will see same data in the already selected rows unless you modify it in the same transaction.

When database has SERIALIZABLE isolation level. in the step number 2 on top of read lock we also get a range lock. this range lock also will only be released when we commit the transaction. Range locks are used to prevent other transactions from inserting new records into the locked range.

Tharsanan
  • 327
  • 2
  • 5
0

I think there are some difference between Non-repeateable-read & phantom-read.

The Non-repeateable means there are tow transaction A & B. if B can notice the modification of A, so maybe happen dirty-read, so we let B notices the modification of A after A committing.

There is new issue: we let B notice the modification of A after A committing, it means A modify a value of row which the B is holding, sometime B will read the row again, so B will get new value different with first time we get, we call it Non-repeateable, to deal with the issue, we let the B remember something(cause i don't know what will be remembered yet) when B start.

Let's think about the new solution, we can notice there is new issue as well, cause we let B remember something, so whatever happened in A, the B can't be affected, but if B want to insert some data into table and B check the table to make sure there is no record, but this data has been inserted by A, so maybe occur some error. We call it Phantom-read.

BartoszKP
  • 34,786
  • 15
  • 102
  • 130
Han R
  • 1
  • 1
0

non-repeatable read is an isolation level and phantom read (reading committed value by other transactions) is a concept (type of read e.g. dirty read or snapshot read). Non-repeatable read isolation level allows phantom read but not dirty reads or snapshot reads.

sn.anurag
  • 617
  • 7
  • 14
0

Both non-repeatable reads and phantom reads result from one transaction T1 seeing changes from another transaction T2 that commits before T1 is complete. The difference is that a non-repeatable read returns different values for the same logical row. (For example, if the primary key is employee_id, then a certain employee may have different salaries in the two results.) A phantom read returns two different sets of rows, but for every row that appears in both sets, the column values are the same.

Don Smith
  • 473
  • 4
  • 10