2

All of explanations of phantom reads I managed to find demonstrate phantom read by running 2 select statements in one transaction (e.g. https://blobeater.blog/2017/10/26/sql-server-phantom-reads/ )

BEGIN TRAN

    SELECT #1

    DELAY DURING WHICH AN INSERT TAKES PLACE IN A DIFFERENT TRANSACTION

    SELECT #2

END TRAN

Is it possible to reproduce a phantom read in one select statement? This would mean that select statement starts on transaction #1. Then insert runs on transaction #2 and commits. Finally select statement from transaction #1 completes, but does not return a row that transaction #2 has inserted.

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
  • 3
    No, in order to experience a phantom read, you need **at least** two separate transactions and statements – marc_s Aug 03 '18 at 10:22
  • It is possible that a single select statement running the default `READ COMMITTED` isolation level will miss rows inserted by other transactions. However, this is not a phantom read. – Dan Guzman Aug 03 '18 at 10:32

2 Answers2

2

The SQL Server Transaction Isolation Levels documentation defines a phantom row as one "that matches the search criteria but is not initially seen" (emphasis mine). Consequently, more than one SELECT statement is needed for a phantom read to occur.

Data inserted during execution SELECT statement execution might not be returned in the READ COMMITTED isolation level depending on the timing but this is not a phantom read by definition. The example below shows this behavior.

--create table with enough data for a long-running SELECT query
CREATE TABLE dbo.PhantomReadExample(
      PhantomReadExampleID int NOT NULL
        CONSTRAINT PK_PhantomReadExample PRIMARY KEY
    , PhantomReadData char(8000) NOT NULL
    );
--insert 100K rows
WITH 
     t10 AS (SELECT n FROM (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) t(n))
    ,t1k AS (SELECT 0 AS n FROM t10 AS a CROSS JOIN t10 AS b CROSS JOIN t10 AS c)
    ,t1m AS (SELECT ROW_NUMBER() OVER (ORDER BY (SELECT 0)) AS num FROM t1k AS a CROSS JOIN t1k AS b)
INSERT INTO dbo.PhantomReadExample WITH(TABLOCKX) (PhantomReadExampleID, PhantomReadData)
SELECT num*2, 'data'
FROM t1m
WHERE num <= 100000;
GO

--run this on connection 1
SELECT *
FROM dbo.PhantomReadExample
ORDER BY PhantomReadExampleID;
GO
--run this on connection 2 while the connection 1 SELECT is running
INSERT INTO dbo.PhantomReadExample(PhantomReadExampleID, PhantomReadData) 
    VALUES(1, 'data');
GO

Shared locks are acquired on rows as they are read during the SELECT query scan to ensure only committed data are read but these are immediately released once data are read improve concurrency. This allows other sessions to insert, update, and delete rows while the SELECT query is running.

The inserted row is not returned in this case because the ordered clustered index scan had already past the point of the insert.

Dan Guzman
  • 43,250
  • 3
  • 46
  • 71
  • With Read Commited, I had thought a shared lock would prevent an update to rows that were being read. Which means that the inserted row would not be read even if it were added in a place not yet scanned? If I am incorrect I will be happy to have learned something today. – Joe C Aug 03 '18 at 11:21
  • One missing piece i think would be a way to keep the select running while the until insert is complete. I did this by running statements in this order: (con. 3) begin tran, update; (con. 1) select; (con 2) insert; (con 3) commit. – Justinas Simanavicius Aug 03 '18 at 11:32
  • @JustinasSimanavicius, I forgot to add the table create and load script. This inserts enough data for a long-running `SELECT` query that can reproduce the issue with autocommit transactions. – Dan Guzman Aug 03 '18 at 11:45
  • @JoeC I believe that shared locks are released right after they are read ( https://stackoverflow.com/a/941960/2128373 ) meaning they can be updated after being read. In my experiment I managed to keep the select running, then insert a row into not yet scanned location, and finally get that in select results. – Justinas Simanavicius Aug 03 '18 at 11:45
  • @JoeC, shared locks are acquired on rows read for read consistency but those are immediately released when no longer needed by the query. I added that detail to my answer. – Dan Guzman Aug 03 '18 at 11:46
  • "not initially seen" - doesn't rule out the possibility that it was not initially seen in a the execution of a single statement and then seen later in that same execution – Martin Smith Aug 03 '18 at 15:51
  • @DanGuzman That's cool, I upvoted your answer by the way as it is a good one. However I am still unclear. Does the shared lock take place on all records to be read, releasing them after the read? Or does it only place a shared lock on a block of records at a time so that if a record is not yet read (locked) you can get changes made after the query started but before the record was read. My understanding is that the former was true and not the latter but I am not 100% sure. – Joe C Aug 03 '18 at 18:30
  • @JoeC, for the example query, an intent shared lock is acquired on individual pages (block of records) as they are read and released as the scan proceeds on to subsequent pages. This way, the entire table is not locked and inserts/updates/deletes can occur while the query is executing. The implication is rows inserted during query execution might or might not be returned depending on the position of the scan and the insert point. – Dan Guzman Aug 04 '18 at 00:19
2

Below is the wikipedia definition of phantom reads

A phantom read occurs when, in the course of a transaction, new rows are added by another transaction to the records being read.

This can occur when range locks are not acquired on performing a SELECT ... WHERE operation. The phantom reads anomaly is a special case of Non-repeatable reads when Transaction 1 repeats a ranged SELECT ... WHERE query and, between both operations, Transaction 2 creates (i.e. INSERT) new rows (in the target table) which fulfill that WHERE clause.

This is certainly possible to reproduce in a single reading query (of course other database activity must also be happening to produce the phantom rows).

Setup

CREATE TABLE Test(X INT PRIMARY KEY);

Connection 1 (leave this running)

SET NOCOUNT ON;
WHILE 1 = 1
INSERT INTO Test VALUES (CRYPT_GEN_RANDOM(4))

Connection 2

This is extremely likely to return some rows if running at read committed lock isolation level (default for the on premise product and enforced with table hint below)

WITH CTE AS
(
SELECT *
FROM Test WITH (READCOMMITTEDLOCK)
WHERE X BETWEEN 0 AND 2147483647
)
SELECT *
FROM CTE c1
FULL OUTER HASH JOIN CTE c2 ON c1.X = c2.X
WHERE (c1.X IS NULL OR c2.X IS NULL)

The returned rows are values added between the first and second read of the table for rows matching the WHERE X BETWEEN 0 AND 2147483647 predicate.

Martin Smith
  • 438,706
  • 87
  • 741
  • 845