1

Classic problem: I have many many client applications (native and web apps) connected to a system, all of which need to get updates in real-time as information changes. Most of the information changes rarely. When it does, the client systems need to know this very quickly. The backend system uses MariaDB in a Galera cluster with InnoDB as the storage engine.

I am aware that you can use the INFORMATION_SCHEMA for this sort of problem, but it is non-deterministic when using INNODB tables - especially in a Galera cluster environment. It is possible that if I rely upon the UPDATE_TIME of the tables in question it will be slightly out of date because of in-memory buffering. It is also possible that different servers in the cluster will have different values for this time at any given moment. Is this a big deal? I doubt it. But I was looking for a solution that would give me accurate data and that was performant.

Each table in question has an 'updated_at' column with an index. I have tried querying each table individually (e.g., select updated_at from TABLE order by updated_at limit 1), but this is very expensive when there are a lot of tables to check. A stored procedure makes this somewhat more efficient, but it still feels inelegant.

For example, would triggers make sense here? The tables are updated relatively rarely when compared to the read activity on the tables. Could I set up an insert/delete/update trigger on each table that updated a corresponding last_updated time in ANOTHER table? Then just monitor that table to decide if data I cared about has changed? Should I use some sort of pub/sub thing to signal 'out of band' that data has changed? Then have things that care subscribe to that?

This has to be a solved problem. Does anyone have a great solution to suggest?

hardillb
  • 54,545
  • 11
  • 67
  • 105

1 Answers1

0

You could query across the metadata this way:

SELECT MAX(UPDATE_TIME)
FROM INFORMATION_SCHEMA.TABLES
WHERE TABLE_SCHEMA = 'myschema'

But you said the metadata is not precise enough for you.

You would have to get the latest updated_at from multiple tables:

SELECT MAX(updated_at) AS max_updated_at
FROM (
    SELECT MAX(updated_at) AS updated_at FROM mytable1
    UNION SELECT MAX(updated_at) FROM mytable2
    UNION SELECT MAX(updated_at) FROM mytable3
    UNION SELECT MAX(updated_at) FROM mytable4
    UNION ...
) AS t

Or alternatively:

SELECT GREATEST(
    t1.updated_at,
    t2.updated_at,
    t3.updated_at,
    t4.updated_at,
    ...
  ) AS max_updated_at
FROM
(SELECT MAX(updated_at) AS updated_at FROM mytable1) AS t1, 
(SELECT MAX(updated_at) AS updated_at FROM mytable2) AS t2,
(SELECT MAX(updated_at) AS updated_at FROM mytable3) AS t3,
(SELECT MAX(updated_at) AS updated_at FROM mytable4) AS t4,
... 

Re your comment:

Apologies for misinterpreting your question. I assumed "last update time for a collection of tables" meant the latest update time for the collection.

To get the last update for each table, we can adapt any of the three alternatives I showed above:

SELECT TABLE_NAME, UPDATE_TIME
FROM INFORMATION_SCHEMA.TABLES
WHERE TABLE_SCHEMA = 'myschema'

Or:

SELECT 'mytable1' AS table_name, updated_at FROM mytable1
UNION SELECT 'mytable2', updated_at FROM mytable2
UNION SELECT 'mytable2', updated_at FROM mytable3
UNION SELECT 'mytable2', updated_at FROM mytable4
UNION ...

Or:

SELECT 
    t1.updated_at AS mytable1_updated_at,
    t2.updated_at AS mytable2_updated_at,
    t3.updated_at AS mytable3_updated_at,
    t4.updated_at AS mytable4_updated_at,
    ...
FROM
(SELECT MAX(updated_at) AS updated_at FROM mytable1) AS t1, 
(SELECT MAX(updated_at) AS updated_at FROM mytable2) AS t2,
(SELECT MAX(updated_at) AS updated_at FROM mytable3) AS t3,
(SELECT MAX(updated_at) AS updated_at FROM mytable4) AS t4,
... 

If the table's updated_at column has an index, MySQL can get the MAX() value quickly from the last entry in the index, without having to scan the whole table.

Bill Karwin
  • 538,548
  • 86
  • 673
  • 828
  • I am not looking for which table was updated most recently. I am looking for the most recent update time from EACH table. But I think I see how to use MAX(updated_at) for a query on all the tables. I am worried it might be a little slow, but I will test it out. – Shane McCarron Apr 06 '20 at 11:56