1

I know, that mysql supports auto-increment values, but no dependent auto-increment values.

i.e. if you have a table like this:

id | element | innerId
1  | a       | 1
2  | a       | 2
3  | b       | 1

And you insert another b-element, you need to compute the innerId on your own, (Excpected insert would be "2")

  • Is there a database supporting something like this?

What would be the best way to achieve this behaviour? I do not know the number of elements, so i cannot create dedicated tables for them, where I just could derrive an id.

(The example is simplyfied)

The target that should be achieved, is that any element "type" (where the number is unknown, possibly infitine -1 should have it's own, gap-less id.

If I would use something like

INSERT INTO 
  myTable t1 
   (id,element, innerId)
   VALUES
   (null, 'b', (SELECT COUNT(*) FROM myTable t2 WHERE t2.element = "b") +1)

http://sqlfiddle.com/#!2/2f4543/1

Will this return the expected result under all circumstances? I mean it works, but what about concurrency? Are Inserts with SubSelects still atomic or might there be a szenario, where two inserts will try to insert the same id? (Especially if a transactional insert is pending?)

Would it be better to try to achieve this with the programming language (i.e. Java)? Or is it easier to implement this logic as close to the database engine as possible?

Since I'm using an aggregation to compute the next innerId, i think using SELECT...FOR UPDATE can not avoid the problem in case of other transactions having pending commits, right?

ps.: I could ofc. just bruteforce the insert - starting at the current max value per element - with a unique key constraint on (element,innerId) until there is no foreignKey-violation - but isn't there a nicer way?

According to Make one ID with auto_increment depending on another ID - possible? it would be possible with a composite primary key on - in my case - innerId and element. But according to this setting MySQL auto_increment to be dependent on two other primary keys that works only for MyIsam (I have InnoDB)


Now i'm confused even more. I tried to use 2 different php scripts to insert data, using the query above. While script one has a "sleep" for 15 seconds in order to allow me to call script two (which should simulate the concurrent modification) - The result was correct when using one query.

(ps.: mysql(?!i)-functions only for quick debugging)

Base Data:

enter image description here

Script 1:

mysql_query("START TRANSACTION");
mysql_query("INSERT INTO insertTest (id, element, innerId, fromPage)VALUES(null, 'a', (SELECT MAX(t2.innerID) FROM insertTest t2 WHERE element='a') +1, 'page1')");

sleep(15);

//mysql_query("ROLLBACK;");
mysql_query("COMMIT;");

Script 2:

//mysql_query("START TRANSACTION");
mysql_query("INSERT INTO insertTest (id, element, innerId, fromPage)VALUES(null, 'a', (SELECT MAX(t2.innerID) FROM insertTest t2 WHERE element='a') +1, 'page2')");
//mysql_query("COMMIT;");

I would have expected that the page2 insert would have happened before the page1 insert, cause it's running without any transaction. But in fact, the page1 insert happened FIRST, causing the second script to also be delayed for about 15 seconds...

(ignore the AC-Id, played around a bit)

enter image description here

When using Rollback on the first script, the second script is still delayed for 15 seconds, and then picking up the correct innerId:

enter image description here

So:

  • Non-Transactional-Insert are blocked while a transaction is active.
  • Inserts with subselects seem also to be blocked.
  • So at the end it seems like a Insert with a subselect is an atomic operation? Or why would the SELECT of the second page has been blocked otherwhise?

Using the selection and insert in seperate, non-transactional statements like this (on page 2, simulating the concurrent modification):

$nextId = mysql_query("SELECT MAX(t2.innerID) as a FROM insertTest t2 WHERE element='a'");
$nextId = mysql_fetch_array($nextId);
$nextId = $nextId["a"] +1;
mysql_query("INSERT INTO insertTest (id, element, innerId, fromPage)VALUES(null, 'a', $nextId, 'page2')");

leads to the error I was trying to avoid:

enter image description here

so why does It work in the concurrent szenario when each modification is one query? Are inserts with subselects atomic?

Community
  • 1
  • 1
dognose
  • 20,360
  • 9
  • 61
  • 107

2 Answers2

1

Well, all (or almost) all databases support the necessary functionality for calculating innerid according to your rules. It is called a trigger, specifically a before insert trigger.

Your particular version will not work consistently in a multi-user environment. Few, if any, databases generate read locks on a table when starting an insert. That means that two insert statements issued very close together would generate the same value for innerid.

Because of concurrency considerations, you should do this calculation in the database, using triggers rather than on the application side.

You always have the possibility of calculating innerid when you need it, rather than when you insert the value. This is computationally expensive, requiring either an order by (using variables) or a correlated subquery. Other databases support window/analytic functions, making such a calculation much easier to express.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • Thx for your answer. Yes the innerID can be calculated, basically it's the *row number* within elements of a certain type. The thing is, that (I hate that sentence) *the customer wants* to have it working for a huge amount of elements within a composite-pattern-layout (limit of n = 5). The expectation of new data is arround several thousands inserts a day, so storing the generated `sub-sub-sub-id` would be better - performance-wise, when you have to find the element 125.5685.125.2145.3369 - cause i could just pick innerId 3369 from the 5th innerID column (other columsn respectivly)) – dognose Nov 03 '14 at 20:17
1

From what I read here: Atomicity multiple MySQL subqueries in an INSERT/UPDATE query? your query seems to be atomic. I've tested it on my MySQL with InnoDB with 4 different programs trying to execute the query 100000 times each. After that I was able to create a combined Unique key on (element,innerid) and it worked well, so it didn't seem to generate a duplicate. However I've got:

Deadlock found when trying to get lock

So you might want to consider this http://dev.mysql.com/doc/refman/5.1/en/innodb-deadlocks.html

EDIT: It seems I could circumvent the deadlock by changing the SQL to

INSERT INTO test (id,element,innerId) VALUES(null, "b", (SELECT Count(*) FROM test t2 WHERE element = 'b' FOR UPDATE )+1);

Community
  • 1
  • 1
Oncaphillis
  • 1,888
  • 13
  • 15
  • Please have a look at my update. I also played around with concurrent inserts, and it *seems* like an insert with subselects is atomic. - At least I have no explanation for the behaviour otherwise. – dognose Nov 03 '14 at 21:36
  • I think from the documentation and out observation we can be pretty sure that it's atomic. The case with the two separated statements may only work if you apply a read lock in the tables. – Oncaphillis Nov 03 '14 at 21:46
  • But what, if the *delay* of the nested select is just because of a - lets call it *bug* - cause the outer-query starts with `INSERT` - Would it *still* be save with regards to concurrent modification, if the concurrent insert kicks in right between the *INSERT* and *(Sub-)SELECT* of "Statement 1"? (I cannot simulate this by calling two scripts ofc, cause it would be a *window* of some mili-seconds) – dognose Nov 03 '14 at 21:49
  • I guess if you want to be 100% sure there is no way around a read lock. – Oncaphillis Nov 03 '14 at 21:59
  • Don't get me wrong, I appreciate your conclusion based on the Observation - but cause I'm a developer from the Heart, I'm not taking a 99.99999% solution :-) – dognose Nov 03 '14 at 22:00
  • @dognose Not offended at all ;-) but I had to deal a lot with database concurrency issues in the past. Learning the hard way that transactions are **not** the perfect solution for all problems, but four parallel programs not colliding on an issue like this busy inserting 10000 records gives me a pretty good feeling. Another issue however is speed and latency. If you're planing on dozens of different programs trying to insert thousands if records paranoid locking might reduce your performance significantly. If you're dealing with dozens of insertions a day you may go for the locking. – Oncaphillis Nov 03 '14 at 22:08
  • Well, the only difference between the *customers requirement* and the *real world* is: The real world can adapt :P . I recommended to refactor the whole thing but ofc. it's about *supporting legacy code* - So the best I can do is: deliver a 100% working solution. The performance issues is ofc. something *not avoidable* when using read locks. But at least it will not produce *wrong* data, which is the most important thing imho. (Performance is ranked 2 ofc - Beeing forced to *either-or* is a dev's nightmare I assume) – dognose Nov 03 '14 at 22:18