3

Google Spanner recommends against using things like timestamps or sequential numbers as the initial part of a primary key or index, which makes sense based on the architecture. However, for my requirements, I do need some way of ensuring a strict “append only” ordering of rows.

I’m using Spanner to model events (as in event sourcing). Each event would have a category, a stream id identifying a sequence where events needs to be strictly ordered with relation to each other, and a few payload fields - I'll ignore the actual payload from here on.

Naively, this would be modelled as:

| Category    | STRING       |
| Stream Id   | STRING       |
| Sequence Nr | INT64        |

(With a primary key consisting of Category, Stream Id, Sequence Nr.) This would ensure a strong ordering of events for one stream. Now as some categories have a lot of events associated with them and Spanner best practices is to have variance in the high bits, it would be best to flip this around. Each "stream" will contain a reasonably small number of events (thousands not millions) and will be read together so to facilitate better distribution of data and encourage locality for events belonging to one stream:

| Stream Id   | STRING       |
| Category    | STRING       |
| Sequence Nr | INT64        |

However, as I'd like to be able to append events without having to read the current state to find out the current sequence number, I'd rather use a timestamp.

| Aggregate Id | STRING      |                         | 
| Category     | STRING      |                         |
| Timestamp    | TIMESTAMP   | allow_commit_timestamp  |

Spanner has a commit timestamp built in that would tag it at the time of the transaction actually being processed. But to the question finally:

Is it possible to represent data as above and get unique commit timestamps even if I commit multiple events in one transaction?

If not, is it possible to ensure strict ordering some other way, by adding additional columns to ensure order?

The documentation states that "Commit timestamp values are not guaranteed to be unique. Transactions that write to non-overlapping sets of fields might have the same timestamp. Transactions that write to overlapping sets of fields have unique timestamps." - but I'm not quite clear on what constitutes as "sets of fields" in this context.

The documentation also states that "The commit timestamp makes creating the changelog easier, because the timestamps can enforce ordering of the changelog entries." but it's not clear what the guarantees are around commit timestamps having an enforced order in the context of multiple concurrent writers or multiple events being written at the same time.

Maxim
  • 4,075
  • 1
  • 14
  • 23
SoftMemes
  • 5,602
  • 4
  • 32
  • 61

1 Answers1

2

If you have multiple events in the same transaction then they will all have the same commit timestamp.

A field is a table cell (one col value in one row). So 'non-overlapping sets of fields' in this context basically means separate rows, because one of the fields is the commit timestamp!

Two independent transactions, one updating row 'R1' and one updating row 'R2' on the same table can theoretically have the same commit timestamp as they are not overlapping.

Is it possible to represent data as above and get unique commit timestamps even if I commit multiple events in one transaction?

In the example you give where you are using commit timestamp in your primary key, then no, you would not be able to add multiple events to the same stream_id/category pair in a single transaction, as they would have the same timestamp -- and therefore same primary key.

If not, is it possible to ensure strict ordering some other way, by adding additional columns to ensure order?

If you used a combination of the commit timestamp and a sequence_number for each (stream_id, category, timestamp) tuple then you can keep the strict ordering within a single transaction:

Increment a sequence number, starting at 0, for each (stream_id, category) pair in the same transaction. The commit timestamp would then ensure the order across different transactions, and the sequence number would ensure the order within the transaction...

RedPandaCurios
  • 2,264
  • 12
  • 20
  • Thank you, that all makes sense. Now, if I introduce a "batch local" sequence number, what would happen in the unlikely situation that two writers attempt to add to the same stream at the same time? As the timestamp and batch sequence id would be part of the primary key, they would both attempt to insert the same tuple (stream id, category, timestamp, 0), so I assume one of them would fail in the same way as if inserting a duplicate key manually? Or are they guaranteed to be inserted with different timestamps in this scenario? – SoftMemes Dec 05 '18 at 15:02
  • If using INSERT, and both occurred at exactly the same time, then one will most likely fail with a duplicate key error, so you would need to retry the transaction. (If using INSERT OR UPDATE then one would overwrite the other, so don't do this!) Although I believe that this would be impossible in practice, because the same spanner node will handling both transactions and one transaction _will_ be handled before the other, but best to handle the error and retry anyway! – RedPandaCurios Dec 06 '18 at 15:35