4

Consider a DynamoDB table consisting of a primary key and two attributes describing a start and an end date. How to query if a time range is overlapping the time ranges in the table without scanning the entire table?

Example: The dynamo table have two records

PK  Start        End
A   2019-01-01   2019-10-01
B   2019-06-01   2019-08-01

Query which records overlap the time range 2018-02-01 to 2019-03-01.

Collin Dauphinee
  • 13,664
  • 1
  • 40
  • 71
Peter Savnik
  • 763
  • 1
  • 8
  • 27
  • Everything I've read so far points at this not being possible. And index for this needs to use binary space partitioning (interval partitioning) which is not possible with a b-tree. – Chet Jun 17 '20 at 23:40

1 Answers1

3

Disclaimer: This answer is flawed, and does not account for ranges which start inside but end outside the query range, or ranges which are bigger than the query range.

As you are no doubt aware, DynamoDB is unable to utilize more than one index in a query.
In most databases, you could place an index on the "start" and "end" columns, the database engine would be able to fairly quickly determine the intersection of matching records.

In lieu of this functionality, we need a way to encode the range information into a single indexable field.

The way to do this is to utilize "Z-order indexing".
Z-order indexing is a way of encoding multi-dimensional information.

Z-order indexing, and how it can be applied to DynamoDB, is described in detail on this amazon blog post, part one, part two.

Essentially the way it works is by interleaving the data from the fields you want to query, you can do this at a binary level, or potentially at a string level as well.
A basic way it could be applied for a date range string would be interleaving your range "20190101" to "20191001" into a single field "2200119901100011"

start     end       interleaved
20190101  20191001  2200119901100011
20190601  20190801  2200119900680011

Then to query dates between "20190502" and "20190905", use the common prefix between the two dates, add one to the ending range(the logic for this is simpler with a binary representation).

interleaved BETWEEN "22001199005" AND "2200119901" AND start >= "20190502" AND end < "20190905"

Note, that using the interleaved index alone, is still approximate, and you still need to define explicit conditions for the start and end ranges.
However, this approach avoids scanning the whole table.
Of course, if you query a huge date range, it might end up querying the whole table anyway, the smaller the range, the more efficient the index will be.

Magnus
  • 7,952
  • 2
  • 26
  • 52
  • This is just brilliant – david_adler Jul 26 '21 at 21:51
  • 1
    I think this query will account nicely for ranges which are contained within the query range. However how do you account for ranges which start inside but end outside the query range? Or even ranges which are bigger than the query range? @Magnus – david_adler Jul 26 '21 at 21:57
  • I'm thinking you can do something like a bit mask the start positions or end positions of the query and then you can plug it in to the formula `query.start < item.end and item.start < query.end` – david_adler Jul 26 '21 at 22:43
  • 1
    @david_adler Good question, I didn't figure out the answer to that question, and ultimately didn't end up making use of Z-order indexing while I was investigating this, if you find out a good solution I would be keen to know. For now I've added a disclaimer about the limits of this approach, up front to the answer. – Magnus Jul 27 '21 at 06:50
  • 1
    My use case is a bit more constrained than the original question. I'm searching for events in a calendar. Most events fall on just one day and I only need to query by day, by week or by month. I represent each each event as `Year + MonthInYear + WeekInMonth + DayInWeek`. If an event is multi-day I create multiple rows. If an event sits on a week which straddles two months I create two rows for that event. I can then use a query+beginsWith search. e.g. Query by month `2021-07`, by week `2021-07-4` (4th week of July), by day `2021-07-4-1` (2nd day of 4th week of July same as `2021-08-0-1`). – david_adler Jul 27 '21 at 11:25
  • I encountered the problem of date overlap (no strict inside time interval). Inspired my your solution @Magnus I posted a new thread here: https://stackoverflow.com/questions/71360821/dynamodb-get-all-items-which-overlap-a-search-time-interval – gio Mar 05 '22 at 09:00