1

In this example, we have a database that has various events, all stored with a uuid.

Our goal is to calculate the difference (in minutes) between the initial timestamp of the uuid and the first timestamp of the next record with the same uuid.

Right now, we're doing it manually in code with multiple SQL statements, which due to high usage is causing issues so we started caching results, however, the requirement is realtime.

Now I'm trying to figure out a way to get the database server to do the work in a single SQL statement.

I've tried various ways of joining results but no luck. There's probably some pattern magic someone knows that would go a long way in helping us figure this out. Any suggestions would be appreciated!

uuid stamp
0df8-e817-050c-d0ce-06c0-53d6-8f99-c636 2021-11-29 15:39:52.000000
0df8-e817-050c-d0ce-06c0-53d6-8f99-c636 2021-11-29 15:33:43.000000
0df8-e817-050c-d0ce-06c0-53d6-8f99-c636 2021-11-29 15:32:26.000000
0a44-e1c3-efc9-6183-8afb-c39c-ee9d-7b75 2021-11-29 14:30:58.000000
ffda-f1ee-ad99-1f9a-16e0-30e6-1ba8-9022 2021-11-23 22:00:50.000000
ffda-f1ee-ad99-1f9a-16e0-30e6-1ba8-9022 2021-11-23 22:00:37.000000
ffda-f1ee-ad99-1f9a-16e0-30e6-1ba8-9022 2021-11-23 21:51:30.000000
fbcd-d8db-777e-f736-00db-f011-b239-2fb7 2021-11-23 19:43:06.000000
fbcd-d8db-777e-f736-00db-f011-b239-2fb7 2021-11-23 19:26:52.000000
fbcd-d8db-777e-f736-00db-f011-b239-2fb7 2021-11-23 19:24:30.000000

Taking 0df8-e817-050c-d0ce-06c0-53d6-8f99-c636 as an example, the lowest timestamp is:

2021-11-29 15:32:26.000000

And the first follow up timestamp is:

2021-11-29 15:39:52.000000

Seems simple enough to get the difference in minutes ... but a single SQL statement solution escapes me.

forpas
  • 160,666
  • 10
  • 38
  • 76
Bill Kervaski
  • 542
  • 6
  • 17
  • If you are using a more recent version of MySQL then you can use [`LEAD()`](https://dev.mysql.com/doc/refman/8.0/en/window-function-descriptions.html#function_lead) to get the next stamp, and [`TIMESTAMPDIFF()`](https://dev.mysql.com/doc/refman/8.0/en/date-and-time-functions.html#function_timestampdiff) to get the difference, leaving:: `SELECT GUID, TIMESTAMPDIFF(MINUTE, Stamp, LEAD(Stamp) OVER(PARTITION BY Guid ORDER BY Stamp)) FROM YourTable` – GarethD Nov 29 '21 at 16:22
  • Not MAX() but the difference in minutes between the first and second rows of the same uuid (thanks, though) – Bill Kervaski Nov 29 '21 at 16:23
  • How is `2021-11-29 15:39:52.000000` the initial value? `2021-11-29 15:33:43.000000` is less than that. – forpas Nov 29 '21 at 16:25
  • Type-o, fixed! (thanks) – Bill Kervaski Nov 29 '21 at 16:26
  • We're using 5.6, sadly, no LEAD() support. That would be awesome. – Bill Kervaski Nov 29 '21 at 16:27
  • `2021-11-29 15:32:26.000000` is th einitial value of `0df8-e817-050c-d0ce-06c0-53d6-8f99-c636` and `2021-11-29 15:33:43.000000` is the next. – forpas Nov 29 '21 at 16:28
  • For `0df8-e817-050c-d0ce-06c0-53d6-8f99-c636` the lowest timestamp is not `2021-11-29 15:33:43.000000` – forpas Nov 29 '21 at 16:36

1 Answers1

1

For your version of MySql that does support window functions you can use a self join and aggregation to get the lowest and the next timestamps and with TIMESTAMPDIFF() get their difference:

SELECT t1.uuid, 
       TIMESTAMPDIFF(MINUTE, MIN(t1.stamp), MIN(t2.stamp)) diff
FROM tablename t1 LEFT JOIN tablename t2
ON t2.uuid = t1.uuid AND t1.stamp < t2.stamp
GROUP BY t1.uuid;

See the demo.

forpas
  • 160,666
  • 10
  • 38
  • 76
  • Thanks :) I tried something similar but ran into performance issues, there's a couple billion rows in this table. You confirmed I was on the right track, experimenting now. – Bill Kervaski Nov 29 '21 at 16:44
  • @BillKervaski a composite index on uuid and stamp would help but this is something that should be tested. – forpas Nov 29 '21 at 16:46
  • Experimenting with constraining the initial scan with BETWEEN (as the timestamp column is indexed) but the problem is I need to only compare the last X records, regardless of the date range. Not seeing a clear way to avoid a full table scan here without a nested select. Still working it. – Bill Kervaski Nov 29 '21 at 17:02
  • 1
    Okay, definitely about 1000% better than before using BETWEEN() to limit the initial scan then just grabbing the last X records. Big thanks!!! – Bill Kervaski Nov 29 '21 at 17:16