1

I am working with the Apache AGE graph database extension for PostgreSQL, and I have a use case where I need to store and query time-based events. My dataset consists of various events occurring at different timestamps, and I want to efficiently retrieve events that occurred within a specific time range or during a certain period.

For example, let's say I have a graph representing social media posts and their timestamps. I want to be able to query all posts that occurred within the last week or during a specific month.

Here's what I've tried:

  • I've created nodes representing posts with a timestamp property.
CALL age.cypher('CREATE (p:Post {content: $1, timestamp: $2})', ['This is a post', '2022-11-12T13:24:00Z']);
  • I attempted to query the posts within a specific date range using the following query:
CALL age.cypher('MATCH (p:Post) WHERE datetime(p.timestamp) >= datetime($1) AND datetime(p.timestamp) <= datetime($2) RETURN p', ['2022-11-01T00:00:00Z', '2022-11-30T23:59:59Z']);

While this approach seems to work, I'm not sure if it's the most efficient way to handle time-based data in a graph database like Apache AGE.

I was expecting the query to return all posts within the specified date range. Although it does return the expected results, I am concerned about the performance and scalability of this approach when dealing with a large number of events.

My questions are:

  • What is the best way to model and store time-based events in Apache AGE?
  • How can I efficiently query events that occurred within a specific time range or period?
  • Are there any best practices or recommendations for handling time-based data in a graph database like Apache AGE? Any guidance or examples would be greatly appreciated.

2 Answers2

1

The custom datatype that Apache AGE uses to store properties for each vertex/edge, an agtype [1], does not currently support datetime objects. This maybe a case where you want to use the integration with native PostgreSQL and have a traditional SQL table of post IDs and timestamps. (Just a two column table to support look ups of post IDs related to timestamps of a specific range.)

While you cannot embed SQL directly into a Cypher query in Apache AGE, you could create a User-Defined-Function [2] to make the SQL call to the post-to-timestamp table and fetch the post IDs and return them to the Cypher query.

[1] https://age.apache.org/age-manual/master/intro/types.html

[2] https://age.apache.org/age-manual/master/advanced/sql_in_cypher.html

Taylor Riggan
  • 1,963
  • 6
  • 12
0

When dealing with time-based events in apache age following points should be considered for effective querying

  • date/time format should be consistent for timestamp properties in all involved nodes.
  • Secondly, if a lot of querying (frequent querying) is based on timestamp ranges then partition of data by timestamp can help for better performance.

You can also use BETWEEN operator within the cypher query to deal with ranges that makes it more convenient to deal with ranges.

Umer Freak
  • 21
  • 3