0

I have two very big tables interleaved by its primary key (just one column, so it is one-to-one relationship). A few rows on the parent table have no child on the other and I want to find them.

Currently, I am doing a JOIN query and searching by NULL values:

         SELECT Users.userID
            FROM Users
            LEFT JOIN Licenses
            ON Users.userID = Licenses.userID
            WHERE Licenses.license IS NULL

But this query still needs to read all Users table to do the JOIN, what is really slow.

I know that if the license column was in the table Users I could create an index with it and would only need to read the rows with NULL license values, but it is not an option to put the column "license" in the same "Users" table.

Is there a way to just pass through the userIDs that do not have yet a license using different tables? e.g. an index with columns from different tables. (I am using interleaved but would it be better foreign keys?)

1 Answers1

0

To clarify, is the following the correct understanding?

  • Users table has primary key UserId
  • Licenses table has primary key UserId, and license column
    • A UserId can only be associated with 0 or 1 licenses
    • Licenses table will only have a row when Licenses.license != NULL, so the number of rows in Licenses < number of rows in Users
  • Both Users and Licenses tables contain many rows

In order to avoid scanning the Users table, perhaps the Licenses table can include even UserIds without licenses. Then, an Index on Licenses.license can be created to help search for licenses that are NULL.

Currently, the Licenses table does not have all the UserIds. If going forward you would like to populate Licenses even when the UserId has a NULL license, then one way is:

  1. First update the application upon Users table insertion to also insert into Licenses even when license is NULL. Also update any other applicable areas of the application.

  2. Once that change is deployed, do a one-time backfill of Licenses with all UserIds in Users that do not have a corresponding row in Licenses. Although this step may take some time, the scan over the Users table and corresponding check in the Licenses table can be done with a read-only transaction [1] which does not take locks.

[1] https://cloud.google.com/spanner/docs/transactions#read-only_transactions

  1. After the backfill is complete, create the Index on Licenses and update the application.

Alternatively, the application can track additional state when inserting/deleting UserIds in Users and inserting/deleting licenses in Licenses. For example, keep track of a MissingLicense table containing all UserIds without a license. The MissingLicense table would have a primary key UserId. When inserting a UserId without a license, insert into MissingLicense as well. When inserting a license for a user, delete from MissingLicense. When deleting a UserId from Users, delete from MissingLicense. MissingLicense can be interleaved under the Users table with ON DELETE CASCADE.

Rose Liu
  • 251
  • 1
  • 5
  • Yes, this is the correct understanding. I thought about first idea, but currently the table has not all the userIDs. Any idea on how to populate those IDs with NULL licenses? – Gabriel Mendes Sep 26 '22 at 14:05
  • I updated the above answer with an idea on how to populate Licenses with UserIds with NULL licenses. – Rose Liu Sep 26 '22 at 18:11
  • It worked! It was kinda tedious to do the backfill but now it's all matching the Users table. Thank you so much! – Gabriel Mendes Sep 30 '22 at 12:31