To clarify, is the following the correct understanding?
- Users table has primary key UserId
- Licenses table has primary key UserId, and license column
- A UserId can only be associated with 0 or 1 licenses
- Licenses table will only have a row when Licenses.license != NULL, so the number of rows in Licenses < number of rows in Users
- Both Users and Licenses tables contain many rows
In order to avoid scanning the Users table, perhaps the Licenses table can include even UserIds without licenses. Then, an Index on Licenses.license can be created to help search for licenses that are NULL.
Currently, the Licenses table does not have all the UserIds. If going forward you would like to populate Licenses even when the UserId has a NULL license, then one way is:
First update the application upon Users table insertion to also insert into Licenses even when license is NULL. Also update any other applicable areas of the application.
Once that change is deployed, do a one-time backfill of Licenses with all UserIds in Users that do not have a corresponding row in Licenses. Although this step may take some time, the scan over the Users table and corresponding check in the Licenses table can be done with a read-only transaction [1] which does not take locks.
[1] https://cloud.google.com/spanner/docs/transactions#read-only_transactions
- After the backfill is complete, create the Index on Licenses and update the application.
Alternatively, the application can track additional state when inserting/deleting UserIds in Users and inserting/deleting licenses in Licenses. For example, keep track of a MissingLicense table containing all UserIds without a license. The MissingLicense table would have a primary key UserId. When inserting a UserId without a license, insert into MissingLicense as well. When inserting a license for a user, delete from MissingLicense. When deleting a UserId from Users, delete from MissingLicense. MissingLicense can be interleaved under the Users table with ON DELETE CASCADE.