Understanding transactions in sql

Question

The relation R(x) consists of a set of integers --- that is, one-component tuples with an integer component. Alice's transaction is a query:

SELECT SUM(x) FROM R;
COMMIT;

Betty's transaction is a sequence of inserts:

INSERT INTO R VALUES(10);
INSERT INTO R VALUES(20);
INSERT INTO R VALUES(30);
COMMIT;

Carol's transaction is a sequence of deletes:

DELETE FROM R WHERE x=30;
DELETE FROM R WHERE x=20;
COMMIT;

Before any of these transactions execute, the sum of the integers in R is 1000, and none of these integers are 10, 20, or 30. If Alice's, Betty's, and Carol's transactions run at about the same time, and each runs under isolation level READ COMMITTED, which sums could be produced by Alice's transaction? Identify one of those sums from the list below.

a)1040 b)950 c)1030 d)1080

This question is a bit confusing to me because there is not anything that tells us which transaction commits first or if they commit at the same time. How do you interpret this problem?

"which sums could be produced by Alice's transaction?" --- this question implies that you need to evaluate different outcomes (depending on transaction commit order). — zerkms, Feb 19 '17 at 23:26
why 950?. It was specified that non of the tuples contain 10,20 or 30 — daniel, Feb 20 '17 at 00:00

Daniel Vérité · Answer 1 · 2017-02-22T20:13:02.073

Two rules come into play in this problem:

Read Committed isolation level: inside a transaction, each SQL statement sees a fresh version of the data (a new "snapshot"), including the changes that another concurrent transaction might have just committed. It means that to answer the question, the relativer order of statements inside each transaction must be considered, not just when each transaction begins or finishes.
atomicity: the changes in a transaction are not seen by other transactions until it commits. Dirty reads are not possible in PostgreSQL as mentioned in the doc chapter on Transaction Isolation

Let's enumerate the statements:

 Alice
A1: SELECT SUM(x) FROM R;
A2: COMMIT;

 Betty
B1: INSERT INTO R VALUES(10);
B2: INSERT INTO R VALUES(20);
B3: INSERT INTO R VALUES(30);
B4: COMMIT;

 Carol
C1: DELETE FROM R WHERE x=30;
C2: DELETE FROM R WHERE x=20;
C3: COMMIT;

For each of the values a)1040 b)950 c)1030 d)1080, you want to find whether there is an order of execution of these statements such that SELECT SUM(x) FROM R in the A1 step happens to produce that value, given the above-mentioned transaction rules.

a=1040

1040 can be read with this order of execution (among others):

B1
B2
C1 -- DELETE WHERE x=30 does nothing because x=30 does not exist in R.
B3 
B4 -- now A1 would see 10,20,30 in R so SUM(x) is 1060 at this point
C2 -- DELETE WHERE x=20 does remove x=20 which exists and is visible by C
C3 -- commits deletion of x=20
A1 -- now A1 can see 10,30 so SUM(x)=1040
A2 -- commit (no effect, A only reads)

EDIT: some people don't believe this result is possible, so please see the actual demo copy-pasted from postgres sessions at the bottom of the question.

b=950

The values 20,30 are not in the table to begin with, so if C1 and C2 can remove 30 and 20, it would be because B2 and B3 added them, and if they did that, B1 added 10 first, so the result would be 960 at least. In the case when C1 and C2 can't remove 30 or 20 because they're not yet written, the result SUM(x) would be necessarily bigger than 960. So A1 can never see SUM(x) as small as 950, it's impossible.

c=1030

For 1030 to be read by A1, it must see in the table either two rows (10),(20) or one row (30), excluding anything else because the other combinations don't sum up to 30.

The (10,20,30) rows inserted by B1,B2,B3 will become visible all simultaneously because of atomicity. C1 and C2 can undo the effect of B3 and B2 respectively, but if C1 finds and deletes x=30, then C2 will necessarily find and delete x=20 too. A1 cannot see the result of C1 before C3 runs, and if C3 has run, C2 has too and so both 30 and 20 are gone from the visibility of A, so it sees (10) only, not (10),(20). Seeing (30) alone is also impossible because if (30) was present, then (10) would be too, since x=10 is inserted by B1 and never deleted afterwards.

So A1 can never see SUM(x)=1030.

d=1080

1000+10+20+30 = 1060 and the code is not adding any more rows, so it's impossible to obtain more than 1060, no matter what the execution order.

EDIT:

Actual demo with postgresql 9.5

# Init 
$ psql -d test

test=# create table R(x int);
CREATE TABLE
test=# insert into R values(1000);
INSERT 0 1
test=# show transaction_isolation ;
 transaction_isolation 
-----------------------
 read committed
(1 row)

test=# \q

** Start 3 distinct concurrent sessions A, B, C with different prompts to distinguish:

$ psql -d test
\set PROMPT1 'alice%R%# '
alice=# 

$ psql -d test
\set PROMPT1 'betty%R%# '
betty=# 

$ psql -d test
\set PROMPT1 'carol%R%# '
carol=#

Now let's run the commands in the order mentioned, what session does what being made obvious by the prompts.

alice=# begin;
BEGIN

betty=# begin;
BEGIN

carol=# begin;
BEGIN

betty=# INSERT INTO R VALUES(10);
INSERT 0 1

betty=# INSERT INTO R VALUES(20);
INSERT 0 1

carol=# DELETE FROM R WHERE x=30;
DELETE 0

betty=# INSERT INTO R VALUES(30);
INSERT 0 1

betty=# commit;
COMMIT

carol=# DELETE FROM R WHERE x=20;
DELETE 1

carol=# commit;
COMMIT

alice=# select sum(x) from r;
 sum  
------
 1040
(1 row)

alice=# commit;
COMMIT

Answer is not correct. If it is correct for Postgres it means it is a trash. Actually there is a locking mechanizm also involved. I never saw that behavior in 20 years working with Oracle and explicitly tested scenarion with SQL-Server right now. There is no such thing as "phantom read". Either inserts are not visible or table locked by INSERT transaction. So... If INSERT starts first then answer is 1010. if DELETE answer is 1060, if SELECT answer is 1000 — Vadim, Feb 22 '17 at 00:01
@Vadim: phantom reads are abundantly documented including by Oracle itself: [On Transaction Isolation Levels](http://www.oracle.com/technetwork/issue-archive/2010/10-jan/o65asktom-082389.html) or [stackoverflow: what is difference between non-repeatable read and phantom read?](http://stackoverflow.com/questions/11043712). Aside from that, I challenge you to test your answer with postgres. — Daniel Vérité, Feb 22 '17 at 19:33

score 0 · Answer 2 · answered Feb 20 '17 at 03:57

0

If they run at the same time, and commit at the same time, then Alice should get a value of 1000. This is because the isolation level is READ COMMITTED which won't read any of the inserts or deletes from the other transactions until they are committed.

It really depends on the order of the COMMITs. Without knowing that, you can't know for sure what the answer will be.

So, my answer is E) 1000. Could you clarify with whoever gave you this question what the commit order is?

answered Feb 20 '17 at 03:57

bbrumm

1,342
1
8
13

1000 - when Alice's SELECT ran before any other COMMIT – Vadim Feb 20 '17 at 04:02
That's right, and like you said in your answer, COMMIT is not needed for SELECT. – bbrumm Feb 20 '17 at 04:02

score 0 · Answer 3 · answered Feb 20 '17 at 04:00

0

With CRUD transactions, there are no such thing as "about the same time".

As long as it is READ COMMITTED It depends on what CRUD transaction commits first before Alice's SELECT runs. Non of answers are correct.

For SELECT transaction COMMIT has no meaning.

There are only 3 possible answers

1000, 1060, 1010

answered Feb 20 '17 at 04:00

Vadim

4,027
2
10
26

The answer is 1040 because remember that read committed guarantees that reading dirty data is not allowed , but it says nothing about new insertions, so phantom reads are allowed – daniel Feb 20 '17 at 05:10
@daniel can you show the math how did you get 1040, when one trasaction inserts 60 and another deletes 50? and what "phantom read" is? – Vadim Feb 20 '17 at 13:53
phantom read is when new data is inserted. That's different from dirty read which is when data is updated. – daniel Feb 20 '17 at 22:33
So you get 1040 in the following way: execute Betty's transaction first and once one insertion is performed, execute carol's transaction. – daniel Feb 20 '17 at 22:34
Note that phantom reads are allowed, so all the insertions will be visible to any transaction. – daniel Feb 20 '17 at 22:36
To me it means if phantom read happens - main principal and purpose of transaction broken. "Execute Betty's transaction first and once one insertion is performed, execute carol's transaction" - There is no way to do anything between insertions in Betty's transaction because it is a transaction. But without transaction (or auto-commit after each statement) - 1040 is only one of possible scenarios. – Vadim Feb 21 '17 at 00:06
I mean until COMMIT nothing from Betty's transaction is visible to Carol's transaction – Vadim Feb 21 '17 at 00:09
No, insertions are visible. Updations are not visible until commit. – daniel Feb 21 '17 at 22:58
If you want to get rid of this transaction feature, add the key word "repeatable read" in your ISOLATION statement. – daniel Feb 21 '17 at 23:04
I've added an actual demo of how you get `1040` in the answer at http://stackoverflow.com/a/42356729/238814. Other results such as `1000` or `1010` are obviously possible but they are not mentioned in the problem statement. – Daniel Vérité Feb 22 '17 at 20:10
"*For SELECT transaction COMMIT has no meaning*" - that's only true for `read committed` isolation level. With `repeatable read` or `serializable` a commit _does_ make a difference even in a read-only transaction. – Feb 22 '17 at 20:12

Understanding transactions in sql

3 Answers3

Actual demo with postgresql 9.5