3

I am trying to learn graphdb and compare its capabilities with relational databases. Consider the following problem:

I have two lists of date ranges: date-in and date-out:

Date-in date ranges:

  • 1/1/2000-12/31/2025
  • 1/1/2026-12/31/2030
  • 2/1/2030-12/31/2033

Date-out date ranges:

  • 2/1/2005-12/31/2020
  • 1/1/2024-12/31/2026

The calculation that I want to do is to subtract all the Date-out date ranges from the Date-in date ranges, meaning that I want to know all the date ranges described in the Date-in ranges that are not described in the Date-out ranges. Note the date ranges may overlap.

The correct answer is:

  • 1/1/2000-1/31/2005
  • 1/1/2021-12/31/2023
  • 1/1/2027-12/31/2030
  • 2/1/2030-12/31/2023

I know how to solve this problem using a relational db such as Postgres. The relational db solution would be:

  1. Use generate_series() to list all the days described in the Date-in ranges and Date-out date. There are ~33 years here, so about 12,000 days total.

  2. Select all the days from the Date-in days list that are not present in the Date-out days list. This should be fast, because again, it's only ~33 years, so about 12,000 days total.

  3. Use the "island" detection SQL query https://www.red-gate.com/simple-talk/sql/t-sql-programming/the-sql-of-gaps-and-islands-in-sequences/ to find contiguous date ranges from the resulting list of days from Step 2.

I don't have an approach to solve this problem using a graphdb gremlin traversal. I tried the same approach in graphdb as in the relational db, but there is no generate_series() in the AWS Neptune implementation of graphdb that I'm aware of. Furthermore I didn't want to add nodes to a graphdb just to run a read type query like this one.

Is there a graphdb solution for this problem?

Stanislav Kralin
  • 11,070
  • 4
  • 35
  • 58
leontp587
  • 791
  • 2
  • 9
  • 21
  • This is perhaps not the type of calculation that is easily done in Gremlin. There are some predicates such as `within`, `without` and `between` that can be used to check whether a value lies within a range but there is not an equivalent to `generate_series`. There is a `math` step but I think in general this calculation is better done in the application. – Kelvin Lawrence Sep 05 '20 at 23:10

1 Answers1

1

Expanding on my comment as an answer.

This is perhaps not the type of calculation that is easily done in Gremlin. There are some predicates such as within, without and between that can be used to check whether a value lies within a range but there is not an equivalent to generate_series. There is a math step that offers scientific calculator functionality, but I think in general this calculation is better done in the application.

Gremlin does allow for in-line code (lambdas/closures) to be included in a query but many Graph Database Engines do not permit that as allowing arbitrary code as part of a graph traversal can expose any number of risks. Even if that were allowed I feel that this operation is probably best done in the application rather than trying to come up with a query to do it.

In more simple cases you can test for a value being within a date range using something like:

gremlin> g.addV('test').property('date',1599830201421)
==>v[60867]

gremlin> g.V().hasLabel('test').has('date',between(1599830201402,1599830212143))
==>v[60867]
Kelvin Lawrence
  • 14,674
  • 2
  • 16
  • 38