4

I have a table in database that is responsible for storing ordered/reorderable lists. It has the following shape:

| id | listId | index | title | ... |

where id is primary key, listId is foreign key that identifies what list the item belongs to, title and other columns are contents of items. index property is responsible for position of item in list. It is an integer counter (starting with 0) that is unique in the scope of the list, but may repeat across lists. Example data:

| id      | listId  | index | title    | ...
---------------------------------------------
| "item1" | "list1" | 0     | "title1" | ...
| "item2" | "list1" | 1     | "title2" | ...
| "item3" | "list1" | 2     | "title3" | ...
| "item4" | "list2" | 0     | "title4" | ...
| "item5" | "list2" | 1     | "title5" | ...

Users can create/delete items, move them inside the list or across lists. To ensure consistency of indexes when running these operations, I do the following:

Create item:

  1. Count items within this list
SELECT COUNT(DISTINCT "Item"."id") as "cnt" 
FROM "item" "Item" 
WHERE "Item"."listId" = ${listId}
  1. Insert new item, with index set to count from step 1:
INSERT INTO "item"("id", "listId", "index", "title", ...) 
VALUES (${id}, ${listId}, ${count}, ${title})

This way index grows with each item inserted into the list.

Move item:

  1. Retrieve item's current listId and index:
SELECT "Item"."listId" AS "Item_listId", "Item"."index" AS "Item_index" 
FROM "item" "Item" 
WHERE "Item"."id" = ${id}
  1. Change index of "shifted" items if necessary, so that order is consistent, e.g. given the item is moved forward, all items between its current position (exclusively) and its next position (inclusively) need to have their index decreased by 1:
UPDATE "item" 
  SET "index" = "index" - 1 
WHERE "listId" = ${listId} 
  AND "index" BETWEEN ${sourceIndex + 1} AND ${destinationIndex}

I'll omit the variation with movement across lists because it is very similar.

  1. Update the item itself:
UPDATE "item" 
   SET "index" = ${destinationIndex} 
WHERE "id" = ${id}

Delete item:

  1. Retrieve item's index and listId

  2. Move all items in same list that are next to this item 1 step back, to remove the gap

UPDATE "item" 
  SET "index" = "index" - 1 
WHERE "listId" = ${listId} 
  AND "index" > ${itemIndex}
  1. Delete item:
DELETE FROM "item" 
WHERE "id" = ${id}

Question is:

What transaction isolation levels should I provide for each of these operations? It is very important for me to keep index column consistent, no gaps and most importantly - no duplicates. Am I getting it right that create item operation is subject to phantom reads, because it counts items by some criteria, and it should be serializable? What about other operations?

Kit Isaev
  • 680
  • 8
  • 19
  • 1
    Regardless of what isolation level you choose and how exactly you are changing the data in your table, you should enforce your most important constraint - no duplicates - using the unique index on `(ListID, Index)`. The unique index constraint is trivial to implement and will always hold. – Vladimir Baranov Jul 29 '19 at 06:54
  • When the record is deleted, lowering the Id of all records with ID greater than the deleted record 1 may cause performance problems when the amount of data is large. My suggestion, 1. Create a sequence object and use it for new Id. Don' t make 'auto increment' to Id column and manage them manually using Sequence. 2. When the record is deleted, decrease the Ids of records with Id greater than the Id of the deleted record by 1. And decrease the Sequence value by 1 in the same transaction. 3. Use the 'Read uncommitted' type as the isolation level. – Gurcan Aug 01 '19 at 10:27
  • And I warn you, If you use Id for operations at UI, updating Id' s when a record deleted, will give huge operational problems to you. Because, someother can be edit another record it has bigger Id value same time. – Gurcan Aug 01 '19 at 10:40
  • @Gurcan I need ids to be persistent/order independent, and also prefer them to be uuids not integers. So id and index are separate columns. Incerementing/decrementing record indexes is not an issue for me since a column is not intended to have more than ~10 items. – Kit Isaev Aug 01 '19 at 14:31
  • So, it' s look like ok at base design. Index-sequence mechanism can be how I suggested. Do you agree ? – Gurcan Aug 01 '19 at 14:45

2 Answers2

1

According to your constraints, you can create a unique index on two columns: listId,index can be defined as unique. That will avoid duplicates.

Additionally to avoid gaps I would recommend:

select listId, index, (select min(index) from Item i2 where listId = :listId and i2.index > i1.index) as nextIndex from Item i1 where nextIndex - index > 1 and listId = :listId

at the end of each transaction. Together with transaction isolation level: "Repeatable Read" and rolling back and repeating the transaction if either the unique constraint fails, or the statement, I suggested, returned a record, this should meet your requirements.

aschoerk
  • 3,333
  • 2
  • 15
  • 29
  • The idea to use select condition as some sort of “emulated constraint” is smart, didn’t think of it. Though unfortunately unique index won’t work because columns are not exactly unique. I move/delete them in 2 write operations (first move adjanced items, then move/delete item itself), and operation 1 would result in this index being violated – Kit Isaev Jul 31 '19 at 10:03
  • 1
    Depending on the dbms the unique key constraints get probably checked at the end of the transaction (See: https://www.postgresql.org/docs/9.1/index-unique-checks.html) then you would not have a problem with the move. Otherwise extend the proposed select statement to detect identical indexes but different ids. – aschoerk Jul 31 '19 at 14:10
1

Without knowing more about your specific application, the safest bet is indeed to use serializable as isolation level whenever you access that table but even that level may not be sufficient for your specific case.

A unique constraint on (listId, index) would prevent duplicates (what about the title? Can it be repeated in the same list?), some accurately crafted "watchdog" queries can further mitigate issues and database sequences or stored procedures can ensure that there are no gaps but truth is the mechanism itself seems fragile.

Knowing only so much of your specific problem, what you appear to have is a concurrency problem at user level in the sense that several users can access the same objects at the same time and make changes on them. Assuming this is your typical web-application with a stateless back-end (hence inherently distributed) this may carry a large amount of implications in terms of user experience reflecting on the architecture and even functional requirements. Say for example that user Foo moves item Car to List B which is currently being worked on by user Bar. It is then legit to assume that Bar will need to see item Car as soon as the operation is completed, but that will not happen unless there's some mechanism in place to immediately notify users of List B of the change. The more users you have working on the same set of lists, the worse it becomes even with notifications as you would have more and more of them up to the point where users see things changing all the time and just can't keep up with it.

There's a lot of assumptions anyone will make to provide you an answer. My own lead me to state that you probably need to revise the requirements for that application or ensure that management is aware of several limitations and that they accept them. This type of problem is pretty common in distributed applications. Usually "locks" on certain sets of data are placed (either through database or shared memory pools) so that only one user can alter them at any given time or, alternatively, a workflow is provided to manage conflicting operations (much like versioning systems). When neither is done, a log of operations is kept to understand what happened and rectify problems later on should they be detected.

Filippo Possenti
  • 1,300
  • 8
  • 18
  • The application is a trello-like board. Updates are delivered to frontend realtime wia websocket, and as boards are limited to ~20 users hence I'm not expecting lots of collisions where same card is being moved by multiple users simultaneously. On the other hand, frontend rendering/ui logic relies on consistent index ordering, duplicate indexes or gaps can lead to undefined behaviour. And it seems cleaner to guarantee this consistency of backend then to try make frontend aware of possible inconsistency. – Kit Isaev Aug 06 '19 at 21:17
  • 1
    In that case, "serializable" may be excessive. Try with "Repeatable Read" but, at the same time, implement some logic on the server-side (not on database-side) to serialize operations done on the same board. One way could be to construct an object for each "currently active" board and then use ***synchronized*** on that object to block users from gaining access at the same time, thus serializing the operations. Another (probably better) way is to use some sort of queue mechanism that serializes operations and returns feedback on the websocket. – Filippo Possenti Aug 07 '19 at 07:06