4

I have a table in Scylla

CREATE TABLE event (
    eventSource TEXT,        
    createdAt TIMEUUID,
    eventData TEXT,
    PRIMARY KEY (eventSource, createdAt)
) WITH cdc = {'enabled':true};

now I wish to track inserts/updates done to this table for a particular partition key using CDC

I know that Stream ID in CDC depends on generation and base table partition key, but how do I get this stream ID without doing a full table scan for CDC table?

Is there some function that would convert a given partition key to a stream id for currently active generation?

let4be
  • 1,048
  • 11
  • 30

1 Answers1

5

It's all documented here.

In short, there's a special table which stores all the streams for different generations.

You can query it like this:

SELECT streams FROM system_distributed.cdc_description WHERE time = '2020-03-25 16:05:29.484+0000';

There's no easy way to find stream_id for a partition_key. It can be reversed engineered though:

  1. stream_id is a blob of 16 bytes. The first 8 bytes are the value of the token that is assigned to this stream.
  2. All partition keys that live in the same vnode and shard as this token, belong to this stream.
haaawk
  • 330
  • 1
  • 8
  • Is there any reason such function is not made available? Why the need to reverse engineer and bomb users with extra complexity and implementation details? – let4be Apr 16 '20 at 19:14
  • 2
    This is open source software. We require your input and suggestions for functionality that makes the database work better for you. Why isn't it in there? I don't know - maybe priorities, and they haven't built it yet. Or maybe nobody thought it would be useful? I promise the intent was not malicious. :) I would recommend searching to see if the request has been made, and, if not, filing an issue with the request - https://github.com/scylladb/scylla – Greg Apr 20 '20 at 21:04