0

This is something very disturbing I stumbled upon while stress-testing an application using Sybase ASE 15.7.

We have the following table:

CREATE TABLE foo
(
    i INT NOT NULL,
    blob  IMAGE    
);
ALTER TABLE foo ADD PRIMARY KEY (i);

The table has, even before starting the test, a single row with some data in the IMAGE column. No rows are either deleted or inserted during the test. So the table always contains a single row. Column blob is only updated (in transaction T1 below) to some value (not NULL).

Then, we have the following two transactions:

T1: UPDATE foo SET blob=<some not null value> WHERE i=1
T2: SELECT * FROM foo WHERE i=1

For some reason, the above transactions may deadlock under load (approx. 10 threads doing T1 20 times in a loop and another 10 threads doing T2 20 times in loop).

This is already weird enough, but there's more to come. T1 is always chosen as the deadlock victim. So, the application logic, on the event of a deadlock (error code 1205) simply retries T1. This should work and should normally be the end of the story. However …

… it happens that sometimes T2 will retrieve a row in which the value of the blob column is NULL! This is even though the table already starts with a row and the updates simply reset the previous (non-NULL) value to some other (non-NULL) value. This is 100% reproducible in every test run.

This is observed with the READ COMMITTED serialization level.

I verified that the above behavior also occurs with the TEXT column type but not with VARCHAR.

I've also verified that obtaining an exlusive lock on table foo in transaction T1 makes the issue go away.

So I'd like to understand how can something that so fundamentally breaks transaction isolation be even possible? In fact, I think this is worse than transaction isolation as T1 never sets the value of the blob column to NULL.

The test code is written in Java using the jconn4.jar driver (class com.sybase.jdbc4.jdbc.SybDriver) so I don't rule out that this may be a JDBC driver bug.

update

This is reproducible simply using isql and spawning several shells in parallel that continuously execute T1 in a loop. So I am removing the Java and JDBC tags as this is definitely server-related.

Marcus Junius Brutus
  • 26,087
  • 41
  • 189
  • 331
  • no answers, just some questions/comments: might be of interest to know the lock types and page numbers reported in the deadlock messages (page numbers can be used to determine page type, eg, data, text or index); which locking scheme are you using (allpages vs datapages vs datarows)? to rule out a potential client side issue you may want to try running your tests via the `isql` command line tool (shouldn't be too hard to write a small shell script to spawn the 20 sessions eh); how are you verifying use of READ COMMITTED isolation level? – markp-fuso Nov 02 '17 at 21:33
  • @markp yes we are currently trying to repro using `sqsh` alone. `READ COMMITED` is explicitly set on the connection using methods of the JDBC API. – Marcus Junius Brutus Nov 02 '17 at 21:42
  • how did you solve your issue afterall? – Moudiz Nov 06 '18 at 08:58
  • @Moudiz sent bug report to Sybase. They didn't even bother to try on the exact version we told them we used (as per our support contract) and simply replied after quite some time that they couldn't reproduce it "in the latest version". Sybase is abandonware plain and simple. – Marcus Junius Brutus Nov 06 '18 at 14:53
  • @MarcusJuniusBrutus well I am having similar issue on prod database, I am trying to check in-row on the image columns if it can solve it – Moudiz Nov 06 '18 at 15:05
  • If your text column is nullable, you should also consider setting the dealloc_first_txtpg via sp_chgattribute against the table. This prevents the wasting of a page per row in the text chain for any row where the text column is null. You need to either select into or BCP the data out/in for this to deallocate the space for the null data though. – Rich Campbell Nov 07 '18 at 08:18

2 Answers2

0

Your example create table code by default would create an allpages locked table unless your DBA has changed the system-wide 'lock scheme' parameter via sp_configure to another value(you can check this yourself as anyone via sp_configure 'lock scheme'.

Unless you have a very large number of rows they are all going to be sat on a single data page because an int is only 4 bytes long and the blob data is stored at the end of the table (unless you use the in-row LOB functionality in ASE15.7 and up). This is why you are getting deadlocks. You have by definition created a single hotspot where all the data is being accessed at the page level. This is even more likely where larger page sizes > 2k are used, since by their nature they will have even more rows per page and with allpages locking, even more likelihood of contention.

Change your locking scheme to datarows (unless you are planning to have very high rowcounts) as has been said above and your problem should go away. I will add that your blob column looks to allow nulls from your code, so you should also consider setting the 'dealloc_first_txtpg' attribute for your table to avoid wasted space if you have nulls in your image column.

Rich Campbell
  • 566
  • 2
  • 9
  • Changing the lock schema is not a solution , my table is datarows and I am facing the same issue. locking and blocking due table containes LOB columns – Moudiz Nov 06 '18 at 08:58
  • @Moudiz : I've replied on your other post re the in-row lob perhaps some comments about your table structure and indexes and specific update queries may help identify when you're getting blocking. – Rich Campbell Nov 06 '18 at 11:39
  • This is about more than just deadlocks; how is it possible that transaction T2 reads the blob IMAGE column data as NULL unless there's something horribly broken in Sybase's internals? – Marcus Junius Brutus Nov 06 '18 at 17:52
  • He didnt say it's reading real data as null but rather that it still scans the text chain for a row where the text column is null (at least thats how I read it). Which is normal as it doesnt know that it's null until it checks the text column, which is why using the dealloc_first_txtpg setting for the table is a good idea as no text chain is stored for nulls then. – Rich Campbell Nov 07 '18 at 08:20
0

We've seen all kinds of weird stuff with isolation level 1. I'm under the impression that when T2 is in progress, T1 can change data and T2 might return intermediate result of T1.

Try isolation level 2 and see if it helps (does for us).

jpp
  • 159,742
  • 34
  • 281
  • 339
CSBob
  • 39
  • 2