0

I have the following one-to-many relationship:

Account 1--* User

The Account contains global account-level information, which is mutable.

The User contains user-level information, which is also mutable.

When the user signs-in, they need both Account and User information. (I only know the UserId at this point).

I ideally want to design the schema such that a single query is necessary. However, I cannot determine how to do this without duplicating the Account into each User and thus requiring some background Lambda job to propagate changes to Account attributes across all User objects -- which, for the record, seems like more resource usage (and code to maintain) than simply normalizing the data and having 2 queries on each sign-in: fetch user, then fetch account (using an FK inside the user object that identifies the account).


Is it possible to design a schema that allows one query to fetch both and doesn't require a non-transactional background job to propagate updates? (Transactional batch updates are out of the question, since there's >25 users.) And if not, is the 2-query idea the best / an acceptable method?


Lawrence Wagerfield
  • 6,471
  • 5
  • 42
  • 84
  • One approach I've thought of for efficiently fetching parent+child, given the child ID, is to have the child ID contain the parent ID within it. Then given any child ID, you can perform a "batchGetItem" call passing the two PKs in. While this still costs the same as 2x queries, it's actually as fast as a single query given that "batchGetItem" will fetch the items in parallel... so you're only waiting on the longest query... but it's still just one query's worth of 'waiting' that you're doing. Assumes you're at the start of the project / can choose how IDs are generated. – Lawrence Wagerfield Nov 10 '20 at 23:00

2 Answers2

2

I'll focus on one angle in your question - the 2-query idea. In many cases it is indeed an acceptable method, better than the alternatives. In fact in many NoSQL uses, every user-visible request results in significantly more than two database requests. In fact, it is often stated that this is the reason why NoSQL systems care about low tail latencies (i.e., even 99th percentile latencies should be low).

You didn't say why you wanted to avoid the 2-query solution. The 2-query implementation you presented has two downsides:

  1. It is more costly: you need to do two queries instead of one, costing (when the reads are shorter than 4 KB) double than a single read.
  2. Latency doubles if you need to do the first query, and only then can do the second query.

There may be tricks you can use to solve both problems, depending on more details of your use case:

For the latency: You didn't say what is a "user id" in your application. If it is some sort of unique numeric identifier, maybe it can be set up such that the account id can be determined from the user id directly, without a table lookup (e.g., the first bits of the user id are the account id). If this is the case, you can start both lookups at the same time, and not double the latency. The cost will still be double, but not the latency.

For the cost: If there is a large number of users per account (you said there are more than 25 - I don't know if it's much more or not), it may be useful to cache the Account data, so that not every user lookup will need to read the Account data again - it might often be cached. If Account information rarely changes and consistency of it is not a big deal (I don't know if it is...), you can also get by with doing an "eventual consistency" read for the Account information - which costs half of the regular "consistent" read.

jellycsc
  • 10,904
  • 2
  • 15
  • 32
Nadav Har'El
  • 11,785
  • 1
  • 24
  • 45
  • Thank you @Nadav, very helpful indeed! As to "You didn't say why you wanted to avoid the 2-query solution.": in the re:Invent videos, Rick H implies the main goal is to get everything into a single query _and_ if you normalize your data at all / use FKs, you'll be hit with the naughty stick. The composite key / account ID in the user ID idea is neat (I'd actually posted this in a comment, so it must be a common pattern!) and is the approach I'll go down. Thank you for lifting the mental ban I had on multiple queries! (Case-by-case of course.. sign-ins won't be a high-frequency op in this app.) – Lawrence Wagerfield Nov 10 '20 at 23:33
0

I think the following scheme will be useful for.

  • You will store both account and user records inthe same table
  • You want to get both account metadata and linked users in a single query
PK: account   SK: recordId

=== Account record ===
account: 123512321   recordId: METADATA   attributes: name, environment, ownerId...

=== User record ===
account: 123512321   recordId: USERID#34543543   attributes: name, email, phone...

With this denormalization of the data, you can retrieve both account metadata and related users in a single query. You can also change the account metadata without a need to apply any change to related users.

BONUS: you can also link other types of assets to the account record

Alberto S.
  • 1,805
  • 23
  • 39