1

I think this is a Gaps and Islands problem however my gaps are actually defined by logic and constraints rather than missing data. I am also working with time-series style data rather than integer IDs so many of the examples I've found don't seem to apply - or I don't know how.

I am using PostgreSQL with PostGIS extension and have the following database table:

create table event
(
    id uuid not null,
    organisation_id uuid not null,
    asset_id uuid not null,
    time timestamp with time zone not null,
    data jsonb not null,
    location extensions.geography(Point,4326),
    special_location boolean,
    constraint event_pkey
        primary key (id),
    constraint event_organisation_id_fk
        foreign key (organisation_id) references organisation,
    constraint event_asset_id_fk
        foreign key (asset_id) references asset
);

create unique index event_id_uindex on event (id);

create index event_location_index on event USING GIST (location);

I am attempting to write a query to retrieve a list of sessions for a specified asset_id where the sessions start and/or end between the min and max date-times specified in the query.

e.g.
For <asset_id> get me all trips between <TimeA> and <TimeB>.

StartTime1, StartLocation1, EndTime1, EndLocation1
StartTime2, StartLocation2, EndTime2, EndLocation2
StartTime3, StartLocation3, EndTime3, EndLocation3
StartTime4, StartLocation4, EndTime4, EndLocation4

There is nothing in the event records that singles them out as Start of Session or End of Session events. Instead, the following rules apply:

  • A session is considered to start when the asset's location is 'M' meters away from the previous session's EndLocation.

  • If there is no previous session for the specified asset_id then use the first known location of the asset as a pseudo EndLocation to compare against.

  • A session's StartLocation should be reported as the previous session's EndLocation not the location after moving 'M' meters.

  • Movement may occur in between sessions but if the vehicle has moved less than 'M' meters away from the previous session's EndLocation then this can be disregarded. i.e. No summing of movements, the distance should be as the crow flies from the previous EndLocation.

  • A session is considered to have ended when the asset has not moved more than 'M' meters in 'T' time where 'T' time will actually be specified as two values... one value of 'T' to be used if the 'special_location' field for the assets is set to true. The other value of 'T' to be used if the 'special_location' field is set to false. i.e. If the asset is in a special location then the trip will end much sooner than if the asset is in an unknown (non-special) location.

  • Nice to have but not a show stopper... A session's StartLocation should be the previous session's EndLocation (as noted above) however ideally we should be able to allow for missing events by specifying a value of 'X' meters in the query. If the asset is within 'X' meters of the previous session's EndLocation then the previous session's EndLocation should be used as the next session's StartLocation. Otherwise the actual location recorded for the first event of the session should be used as the next session's StartLocation.

I realise the above is complex and might be too much for a single Stack Overflow question. If you think this then I will still be very grateful for suggestions and examples that show how parts of the above might be achieved and/or topics I should read-up on. To be honest, someone confirming if the above is even possible in a single query would be a start!

Topics I have learned about and experimented with whilst considering approaches to the above include:

  • PostgreSQL Window Functions

  • Example 'Gaps and Islands' problems. I have experimented with a lot of examples for this but there aren't many examples that use time rather than integer IDs. Also, my gaps are defined by constraints rather than actual gaps in data. Example starting points are:

    https://stackoverflow.com/a/24101979/9599047

    https://dba.stackexchange.com/a/167069

  • This is the first time I have used PostGIS and I was at least able to calculate distances using ST_Distance with queries such as the following:

    SELECT ST_Distance(location::geography, lag(location::geography, 1) OVER (ORDER BY time ASC)) as distance from event;

Alex
  • 11
  • 1
  • I haven't read all the details, but this looks similar to [Packing Intervals by Itzik Ben-Gan](http://blogs.solidq.com/en/sqlserver/packing-intervals/), specifically the last part "Ignoring Gaps of Up To a Certain Length". In your case the length of the gap would be the distance in meters. The article is for SQL Server, but Postgres has all the features of SQL Server that were mentioned in the article. – Vladimir Baranov Apr 05 '18 at 05:42
  • Thanks @VladimirBaranov This is not something I had come across in the earlier research so I will spend some time trying to determine if i can apply it to my scenario. However, the first things that stands out is that the example dataset in the article already includes session start and end times where these are the values we are trying to determine. – Alex Apr 05 '18 at 10:15
  • I thought that you have many intervals with small gaps between them. If you pack those intervals into one interval ignoring the gaps shorter than a certain length, it would give you the session start and end times that you are trying to determine. – Vladimir Baranov Apr 05 '18 at 10:43

0 Answers0