0

The context :
I have a SQLite database with a table looking like this :
CREATE TABLE items (item TEXT, position INTEGER)

This table contain several million of entries. The column position is indexed.

My front-end occasionally build a bunch of updates to apply on the position column. The rows to be updated are not identified by their id, because the updates can be on a wide range of items, and list all id concerned can be very expensive. By example, a bunch can be update action like "add +10 to position >= 500 & position <= 10000" - "add -3 to position >= 100000 & position <= 100003", etc.

The problem : The conditional positions of the updates of a bunch of update are all based on the positions values before the bunched update. If I execute sequentially each update action of the bunched update, then there will probably (and actually there is) an "overlapping" problem after one of the update.

By example :

item | position
it1  | 1
it2  | 2
it3  | 3
it4  | 4
it5  | 5
it6  | 6
it7  | 7

If I have this bunch of update : "add + 2 to position >= 5 & position <= 6" - "add - 2 to position >= 3 & position <= 4" and I convert this to this SQLite queries :
UPDATE items SET position=position-2 WHERE position >= 5 AND position <= 6
UPDATE items SET position=position+2 WHERE position >= 3 AND position <= 4

I'm going to have this result :

 it1 | 1
 it2 | 2
 it3 | 5
 it4 | 6
 it5 | 5
 it6 | 6
 it7 | 7

Instead of what I want, that's to say :

 it1 | 1          {  it1 | 1  }
 it2 | 2          {  it2 | 2  }
 it3 | 5          {  it5 | 3  }
 it4 | 6    ===>  {  it6 | 4  }
 it5 | 3          {  it3 | 5  }
 it6 | 4          {  it4 | 6  }
 it7 | 7          {  it7 | 7  }

This is because of the "overlapping" from one operation to another one.

My first idea was to use CASE like this :

UPDATE items SET position=CASE WHEN position >= 5 AND position <= 6 THEN position-2 WHEN position >= 3 AND position <= 4 THEN position+2 ELSE position END
This solution work fine but it's very very slow, as it seem that SQLite execute this on the millions of entries of my table, even if the major party are not concerned by the update.

So I modified this like this :
UPDATE items SET position=CASE WHEN position >= 5 AND position <= 6 THEN position-2 WHEN position >= 3 AND position <= 4 THEN position+2 ELSE position END WHERE (position >= 5 AND position <= 6) OR (position >= 3 AND position <= 4)
This solution work fine and it's very quick, as SQLite execute the update only on the concerned rows.

The question : As I can have 10, or even perhaps 100 updates action to do in one bunch of update, the query can become very large, and this doesn't look very... "beautiful" / "nice".

Do you think there is a more beautiful way to handle this ? Can SQLite be able to execute some "UPDATE" queries by selecting the affected row in a first time, then effectively update the row after ?

Ideas ? Thought ?

Thank you very much !

2 Answers2

0

Welcome to SO.

I have a feeling that there is alternative solution, but persuant to your question, I would change your schema:

CREATE TABLE items (item TEXT, position INTEGER, position_tmp INTEGER)

Run your updates like this:

UPDATE items SET position_tmp = position + 1 WHERE 100 <= position AND position < 200
UPDATE items SET position_tmp = position - 3 WHERE 500 <= position AND position < 1000
...

Finally,

UPDATE items SET position = position_tmp

It may seem unpleasant to permanently have that extra position_tmp column (and it is), but SQLite does not support renaming or dropping columns, and even if it did, it is more performant to keep it.

EDIT:

I believe I now understand why you are trying to do this. You have millions of items that you want to impose an order on. If you remove one from the middle of the list, you have to adjust the rest to reflect the new order.

Depending on what you need, it may be possible to use floating point numbers to order them. For example, 0.1, 2.5, 5.0. If you want to insert something as the second element, insert with position 1.3. If you want to delete the last element, just delete it.

Otherwise, you basically are dealing with an age-old problem of having a list with fast insertion and removal.

Paul Draper
  • 78,542
  • 46
  • 206
  • 285
  • Thank you for answering. Effectively, this can be a solution. I had a similar idea: add a "flag" column. Before each bulk update, set all row flags to 0, and on update query, update row only with the flag "0", and set the flag to "1" in the same time. My biggest concern : this column take a lot of space (values stored millions of time), and take time to update. And I have the feeling that create an extra column to answer a logic problem is not a good thing. What is your feeling about an alternative solution ? Perhaps I'm wrong from the start of the problem. –  Oct 27 '13 at 14:19
  • In SQLite, there is no solution that is what I would call performant within the parameters you have described. – Paul Draper Oct 27 '13 at 18:02
  • Is there any documentation / blog / best practices about how to manage rows positioning in database ? –  Oct 27 '13 at 19:27
  • I don't know of a good resource, but my edited answer may be of interest. – Paul Draper Oct 27 '13 at 22:45
  • You may get some speed improvement using `UPDATE items SET position = position_tmp, position_tmp=NULL WHERE position_tmp IS NOT NULL;` – LS_ᴅᴇᴠ Oct 28 '13 at 10:03
  • 1
    @paul-draper : yes you got it. The floating point number is interesting. It look like the BASIC lines number which was incremented 10 by 10 to manage insertion / deletion. Why not. –  Oct 28 '13 at 16:39
  • @ls-dev: The usage of NULL is interesting. Thank you. –  Oct 28 '13 at 16:40
  • @JulienAvérous, yes :P. Except floating points have more granularity (essentially infinite in most cases). But nice analogy :/ – Paul Draper Oct 28 '13 at 19:55
0

You must see your problem algorithmically:

Adding A to items from position P1 to P2:

positions in [P1, P2] will become [P1+A, P2+A]
positions in [P2+1, P2+A] will become [P1, P1+A-1] (shift: -(P2-P1+1)=P1-P2-1)

SQL:

UPDATE items SET position=CASE position <= P2 THEN position+A ELSE position-P1-P2-1 WHERE position BETWEEN P1 AND P2+A;

For subtracting S to items from position Q1 to Q2:

positions in [Q1, Q2] will become [Q1-S, Q2-S]
positions in [Q1-S, Q1-1] will become [Q2-S+1, P2] (shift: P2-P1+1)

SQL:

UPDATE items SET position=CASE position >= Q1 THEN position-S ELSE position+P2-P1+1 WHERE position BETWEEN Q1-S AND Q2;

So, using one of these two queries, you will be able to update all relevant positions in a single statement. Just use appropriate (P1, P2, A) or (Q1, Q2, S).

LS_ᴅᴇᴠ
  • 10,823
  • 1
  • 23
  • 46
  • Hmm yes. Interesting. My problem with this is that there is still a problem with bunched update : initial positions of each "atomic" update in the bunch are all related to the initial positions of the row (before the bunched update). But I think the easier thing would be to fix the positions of each "atomic" update of the bunch to be relative to the previous update in the bunch, and use your solution. This bunch thing is like a "delta" update intended to merge some moving operations, to prevent too much updates from the front-end. –  Oct 28 '13 at 16:49