0

I have thousands of items in a table. Each item has a numeric attribute score. What I want to do:

  1. Given some targetScore, select all items where targetScore - n <= score <= targetScore + n (I got this part working).
  2. Return only the first j elements where j := ABS(targetScore - score) in ASC order.

Since the table is somewhat large, and will only grow over time, I'm hoping to have a way to delegate step #2 above to the DynamoDB engine. I'm currently selecting all item IDs from step #1, doing the computation for step #2 in process, then selecting the resulting items on a subsequent query.

NoSQLKnowHow
  • 4,449
  • 23
  • 35
rodrigo-silveira
  • 12,607
  • 11
  • 69
  • 123

1 Answers1

1

You don't really need step 1 to get step 2. I'm assuming that score is a sort key in your setup.

In step 2, you want to sort the items by abs(targetScore - score) and take the j lowest ones.

Although DynamoDB doesn't have any direct way to do that, the definition of the absolute value makes it quite easy to do. First, observe that `abs(targetScore - score) is defined as follows: It is:

  1. targetScore - score if score <= targetScore
  2. score - targetScore if score >= targetsScore

So the items with the smallest abs are either items with a score higher than targetScore but as low as possible, or items with a score lower than targetScore, but as high as possible.

So, you can do two separate DynamoDB queries:

  1. Query for the first j items with score >= targetScore, sorted in increasing score.
  2. Query for the first j items with score < targetScore, sorted in decreasing score.

DynanoDB can do both queries for you efficiently and cheaply - it involves a Query with a KeyConditions parameter and a Limit, and ScanIndexForward to do decreasing sorting. But no expensive and inefficient FilterExpression!

Once you have the j smallest items of each type 1 and type 2, all that remains is to pick the overall j smallest items from these 2*j items we got.

Nadav Har'El
  • 11,785
  • 1
  • 24
  • 45
  • How can I do a `Query` without a primary key? Also, `score` in my case is not a sort key, just some arbitrary attribute. – rodrigo-silveira Aug 11 '20 at 19:19
  • If score is not a sort key you can do what I said with FilterExpression but pay for reading the entire database... You need it to be a sort key for cheap requests... – Nadav Har'El Aug 11 '20 at 19:25
  • Gotcha. Thanks. About `Query`, though. Don't I need to specify a specific partition key in order to `Query`? What I want to do is fetch all possible items (across many partitions) where score is within a range. – rodrigo-silveira Aug 11 '20 at 19:35
  • Well, you have `Scan` for that, but that will be even more expensive - it's basically reading the *entire* database! I don't know if that's really what you want... – Nadav Har'El Aug 11 '20 at 19:37
  • Exactly my question. I'm currently doing a `Scan` because I can't think of a practical way to achieve what I want. Should I have a secondary index where everything has the same partition key so I can do the two queries you suggest? – rodrigo-silveira Aug 11 '20 at 20:05
  • That's an option, yes. But you will end up with the SI having everything in one partition, so you might as well just put everything in the same partition in the original table... I'm missing a lot of information about your use case. – Nadav Har'El Aug 11 '20 at 20:35