2

My application manages bookings of a user. These bookings are composed by a start_date and end_date, and their current partition in dynamodb is the following:

PK                  SK              DATA
USER#1#BOOKINGS     BOOKING#1       {s: '20190601', e: '20190801'}

[GOAL] I would query all reservations which overlap a search time interval as the following:

enter image description here

I tried to find a solution for this issue but I found only a way to query all items inside a search time interval, which solves only this problem:

enter image description here

I decided to make an implementation of it to try to make some change to solve my problem but I didn't found a solution, following you can find my implementation of "query inside interval" (this is not a dynamodb implementation, but I will replace isBetween function with BETWEEN operand):

import { zip } from 'lodash';

const bookings = [
  { s: '20190601', e: '20190801', i: '' },
  { s: '20180702', e: '20190102', i: '' }
];

const search_start = '20190602'.split('');
const search_end   = '20190630'.split('');

// s:20190601 e:20190801 -> i:2200119900680011
for (const b of bookings) {
  b['i'] = zip(b.s.split(''), b.e.split(''))
    .reduce((p, c) => p + c.join(''), '');
}

// (start_search: 20190502, end_search: 20190905) => 22001199005
const start_clause: string[] = [];
for (let i = 0; i < search_start.length; i += 1) {
  if (search_start[i] === search_end[i]) {
    start_clause.push(search_start[i] + search_end[i]);
  } else {
    start_clause.push(search_start[i]);
    break;
  }
}
const s_index = start_clause.join('');

// (end_search: 20190905, start_search: 20190502) => 22001199009
const end_clause: string[] = [];
for (let i = 0; i < search_end.length; i += 1) {
  if (search_end[i] === search_start[i]) {
    end_clause.push(search_end[i] + search_start[i]);
  } else {
    end_clause.push(search_end[i]);
    break;
  }
}
const e_index = (parseInt(end_clause.join('')) + 1).toString();

const isBetween = (s: string, e: string, v: string) => {
  const sorted = [s,e,v].sort();
  console.info(`sorted: ${sorted}`)
  return sorted[1] === v;
}


const filtered_bookings = bookings
  .filter(b => isBetween(s_index, e_index, b.i));

console.info(`filtered_bookings: ${JSON.stringify(filtered_bookings)}`)
gio
  • 801
  • 3
  • 15
  • 27
  • Is your goal to find overlapping bookings for a specific user ID, or overlapping bookings across all users? Presume the latter. – jarmod Mar 05 '22 at 21:37
  • This is related for a specific user (search inside a partition) – gio Mar 06 '22 at 09:09
  • If you don't have a lot of bookings for a single user ID then you could simply query all bookings for a specific user ID and then filter on start/end dates. You could filter in the DynamoDB query or locally after you retrieve all items for that user ID. Any reason this would not work for you? – jarmod Mar 06 '22 at 16:44
  • I already implement a solution based on KeyConditionExpression on end_date and a FilterExpression on start_date. In this way I can get all bookings inside my interval + all bookings which are more recent than my interval. Through FilterExpression I exclude the second subset, so it remains all bookings of my interval. – gio Mar 09 '22 at 07:51
  • This approach doesn't optimize "request for old bookings", because KeyCondition doesn't exclude much bookings for old interval. My application generates around 5000 bookings a year for user. In 5 years I should perform a query (which is pratically a scan) on 25000 bookings. I think it could be not performant. Of course I could switch Key Condition and FilterExpression for old queries. But my question is about find a simple and elegant solution too perform this type of query on Dynamodb and it seems there's not. – gio Mar 09 '22 at 07:52

2 Answers2

0

There’s not going to be a beautiful and simple yet generic answer.

Probably the best approach is to pre-define your time period size (days, hours, minutes, seconds, whatever) and use the value of that as the PK so for each day (or hour or whatever) you have in that item collection a list of the items touching that day with the sort key of the start time (so you can do the inequality there) and you can use a filter on the end time attribute.

If your chosen time period is days and you need to query across a week then you’ll issue seven queries. So pick a time unit that’s around the same size as your selected time periods.

Remember you need to put all items touching that day (or whatever) into the day collection. If an item spans a week it needs to be inserted 7 times.

hunterhacker
  • 6,378
  • 1
  • 14
  • 11
0

Disclaimer: This is a very use-case-specific and non-general approach I took when trying to solve the same problem; it picks up on @hunterhacker 's approach.

Observations from my use case:

  1. The data I'm dealing with is financial/stock data, which spans back roughly 50 years in the past up to 150 years into the future.
  2. I have many thousands of items per year, and I would like to avoid pulling in all 200 years of information
  3. The vast majority of the items I want to query spans a time that fits within a year (ie. most items don't go from 30-Dec-2001 to 02-Jan-2002, but rather from 05-Mar-2005 to 10-Mar-2005)

Based on the above, I decided to add an LSI and save the relevant year for every item whose start-to-end time is within a single year. The items that straddle a year (or more) I set that LSI with 0.

The querying looks like:

if query_straddles_year:
   # This doesn't happen often in my use case
   result = query_all_and_filter_after()
else:
   # Most cases end up here (looking for a single day, for instance)
   year_constrained_result = query_using_lsi_for_that_year()
   result_on_straddling_bins = query_using_lsi_marked_with_0() # <-- this is to get any of the indexes that do straddle a year
   filter_and_combine(year_constrained_result, result_on_straddling_bins)
schuess
  • 1,009
  • 1
  • 10
  • 21