I designed a table schema for my Spanner database that takes advantage of interleaved tables to store associated records. I did this to optimize read times. My schema looks something like:
CREATE TABLE all_ids (
userid INT64 NOT NULL,
) PRIMARY KEY(userid);
CREATE TABLE connections_in (
userid INT64 NOT NULL,
child_userid INT64 NOT NULL,
connect_time TIMESTAMP NOT NULL,
) PRIMARY KEY(userid, child_userid),
INTERLEAVE IN PARENT all_ids ON DELETE CASCADE;
CREATE TABLE connections_out (
userid INT64 NOT NULL,
child_userid INT64 NOT NULL,
connect_time TIMESTAMP NOT NULL,
) PRIMARY KEY(userid, child_userid),
INTERLEAVE IN PARENT all_ids ON DELETE CASCADE;
The connections_in
and connections_out
tables store graph connections between nodes in a user database. Each user may have at most several hundred connections.
I also want to store the metadata history of each user over time and I'm not sure if it makes sense to interleave that data into the parent table all_ids
or not. Each user has several thousand rows of metadata through time:
CREATE TABLE user_metadata (
userid INT64 NOT NULL,
snapshot_time TIMESTAMP NOT NULL,
username STRING(1024) NOT NULL,
user_description STRING(1024) NOT NULL,
) PRIMARY KEY(userid, snapshot_time DESC),
INTERLEAVE IN PARENT all_ids ON DELETE CASCADE;
I know that spanner uses lexicographic order to order primary keys, so does that mean that a row from the user_metadata
table could be physically collocated next to a row from connections_in
or connections_out
since the first part of both primary keys is taken from the table all_ids
? And if so, does that mean read times from connections_in/_out
based on userid would be slower than if I created a separate non-interleaved table in the same database for user_metadata
like?:
CREATE TABLE user_metadata (
userid INT64 NOT NULL,
snapshot_time TIMESTAMP NOT NULL,
username STRING(1024) NOT NULL,
user_description STRING(1024) NOT NULL,
) PRIMARY KEY(userid, snapshot_time DESC);
Any help is greatly appreciated!