12

I have the need to get last id (primary key) of a table (InnoDB), and to do so I perform the following query:

SELECT (SELECT `AUTO_INCREMENT` FROM `information_schema`.`TABLES` WHERE `TABLE_SCHEMA` = 'mySchema' AND `TABLE_NAME` = 'myTable') - 1;

which returns the wrong AUTO_INCREMENT. The problem is the TABLES table of information_schema is not updated with the current value, unless I run the following query:

ANALYZE TABLE `myTable`;

Why doesn't MySQL update information_schema automatically, and how could I fix this behavior?
Running MySQL Server 8.0.13 X64.

Oliver
  • 926
  • 2
  • 12
  • 31

2 Answers2

11

Q: Why doesn't MySQL update information_schema automatically, and how could I fix this behavior?

A: InnoDB holds the auto_increment value in memory, and doesn't persist that to disk.

Behavior of metadata queries (e.g. SHOW TABLE STATUS) is influenced by setting of innodb_stats_on_metadata and innodb_stats_persistent variables.

https://dev.mysql.com/doc/refman/8.0/en/innodb-parameters.html#sysvar_innodb_stats_on_metadata

Forcing an ANALYZE everytime we query metadata can be a drain on performance.

Other than the settings of those variables, or forcing statistics to be collected by manually executing the ANALYZE TABLE, I don't think there's a "fix" for the issue.

(I think that mostly because I don't think it's a problem that needs to be fixed.)


To get the highest value of an auto_increment column in a table, the normative pattern is:

 SELECT MAX(`ai_col`) FROM `myschema`.`mytable`

What puzzles me is why we need to retrieve this particular piece of information. What are we going to use it for?

Certainly, we aren't going to use that in application code to determine a value that was assigned to a row we just inserted. There's no guarantee that the highest value isn't from a row that was inserted by some other session. And we have LAST_INSERT_ID() mechanism to retrieve the value of a row our session just inserted.

If we go with the ANALYZE TABLE to refresh statistics, there's still a small some time between that and a subsequent SELECT... another session could slip in another INSERT so that the value we get from the gather stats could be "out of date" by the time we retrieve it.

spencer7593
  • 106,611
  • 15
  • 112
  • 140
  • 1
    Thanks for the good explanation. I thought MAX(`id`) could slow the query so found this other approach, but unfortunately doesn't work for InnoDB tables. My use case is the following: i have a table which contains spatial coordinates and on a new insertion, I have to check if the new location is within 10m radius. If so, discard the location. Do you think using this approach is wrong? Thanks! – Oliver Dec 19 '18 at 16:57
  • 2
    Having a suitable index available should make for reasonable performance of `SELECT MAX(mycol) FROM mytable`. The only thing wrong with the use case its a little vague. We need to check if the coordinates of our new (to be inserted) row is within 10mi of the coordinates on the row with the highest value of ai_col ? If we're going to have to query the table to get the coordinates, then we could just do it with something along the lines of `SELECT r.coordinate_col FROM mytable r JOIN ( SELECT MAX(t.id) AS max_id FROM mytable t ) s ON s.max_id = r.id`. – spencer7593 Dec 19 '18 at 17:05
  • 1
    If we're needing to compute the "as the crow flies" Great Circle Distance, we could make the query a little more elaborate, push in the coordinates for our proposed row, do the GCD calculation, and return a distance... – spencer7593 Dec 19 '18 at 17:07
  • I do it the following way: @params lng, lat @returns true->location already exists, false->location doesn't exist: SELECT EXISTS (SELECT * FROM `location` WHERE `id` = (SELECT MAX(`id`) FROM `location`) AND ST_Distance_Sphere(POINT(?, ?), `lnglat`) <= 10); By the way this is not the problem, only the fact the I couldn't retrieve updated values in information_schema. Thanks guys :) – Oliver Dec 19 '18 at 17:09
  • 1
    I'd use EXPLAIN to see the query plan; I'd opt for a `JOIN` to an inline view over the `id = (subquery)` pattern. We just want to make sure MySQL is whittling the result down to a single single row *before* it launches into the distance calculations, we only need to do that on one row. I'd tend to do something like: `SELECT ST_Distance_Sphere(POINT(?, ?), t.lnglat) <= 10 AS bool_ FROM location t JOIN ( SELECT MAX(r.id) AS max_id FROM location r ) s ON s.max_id = t.id` – spencer7593 Dec 19 '18 at 17:14
  • I checked both queries using `EXPLAIN` and yours looks better because it uses indexes better. But wouldn't this query be even better since it uses 2 operations instead of 3? `SELECT ST_DISTANCE_SPHERE(POINT(-122.084,37.4219983),lnglat) <= 10 FROM location WHERE id = (SELECT MAX(id) FROM location);` – Oliver Dec 19 '18 at 18:18
  • 2
    @YanKarin - The subquery is unnecessary if you use `ORDER BY id DESC LIMIT 1`. (Similar to my answer.) Your query (and my variant) will return true/false (1/0) depending on whether it is within 10 units of distance. (Be careful what your units are.) – Rick James Dec 19 '18 at 18:48
  • This is the result of the two `EXPLAIN` as an image, since it's a mess as text: https://i.imgur.com/qezdKPB.jpg Based on the output, which one should I choose (and maybe why)? – Oliver Dec 19 '18 at 19:17
4
SELECT * FROM tbl ORDER BY insert_datetime DESC LIMIT 1;

will get you all the data from, the "latest" inserted row. No need to deal with AUTO_INCREMENT, no need to use subqueries, no ANALYZE, no information_schema, no extra fetch once you have the id, no etc, etc.

Yes, you do need an index on the column that you use to determine what is "latest". Yes, id could be used, but it should not be. AUTO_INCREMENT values are guaranteed to be unique, but nothing else.

Rick James
  • 135,179
  • 13
  • 127
  • 222