1

I have a table called cities:

CREATE TABLE `cities` (
  `id` bigint(20) NOT NULL AUTO_INCREMENT,
  `city` varchar(255) DEFAULT NULL,
  `state_code` varchar(10) DEFAULT NULL,
  `country_code` varchar(10) DEFAULT NULL,
  `latitude` double DEFAULT NULL,
  `longitude` double DEFAULT NULL,
  `geom` geometry NOT NULL,
  PRIMARY KEY (`id`) USING BTREE,
  KEY `country_code` (`country_code`),
  SPATIAL KEY `geom` (`geom`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

It contains around 4000 rows, none of which have the state_code field populated. I'm trying to fill that field using data I have in another table called geodata:

CREATE TABLE `geodata` (
  `region_id` bigint(20) NOT NULL AUTO_INCREMENT,
  `country_code` varchar(10) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL DEFAULT '',
  `state_code` varchar(10) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL DEFAULT '',
  `geom` geometry NOT NULL,
  PRIMARY KEY (`region_id`),
  KEY `country_code` (`country_code`),
  KEY `state_code` (`state_code`),
  SPATIAL KEY `geom` (`geom`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

That table has ~1500 rows with the polygon data for each of the US states and Canadian provinces (some states have more than one entry because the source data broke it up into multiple geometries).

I'm trying to find the most efficient query for looking up the cities.geom point within all the geodata.geom polygons, and then updating cities.state_code with the appropriate data from the geodata table.

Just as a test (SELECTing, not UPDATEing), I tried this:

SELECT
    C.cityid_ppn,
    C.city,
    C.latitude,
    C.longitude,
    G.state_code,
    G.country_code
FROM cities C
LEFT JOIN geodata G ON (
    C.country_code = G.country_code
    AND ST_WITHIN(C.geom, G.geom)
)
WHERE
    G.country_code='CA'
    AND g.state_code='ON'
;

Even with this subset of data, the query takes 9 seconds. Trying to run it on the full set (i.e. without the WHERE ... clause above), I gave up after waiting 5 minutes.

What am I doing wrong? And what would be the best way to do the update of cities?

Colin
  • 2,109
  • 1
  • 20
  • 22
  • 1
    You can try to change city geom data type from GEOMETRY to POINT. Or use lon/lan if city.geom holds not the point but the city's boundries. Also, I think you should change city state & country codes data types to match them to regions (charset/collate). Unfortunately, I do not know if this really help to speed up your query. – fifonik Mar 19 '19 at 22:09
  • Thanks! I tried changing `cities.geom` to POINT and even `geodata.geom` to POLYGON, and that didn't seem to affect the query speed at all. Also, changing the charset/collate on the code columns to match didn't help either. – Colin Mar 20 '19 at 11:48
  • 1
    Probably you have already seen this (looks like your case): https://stackoverflow.com/questions/53182385/optimizing-mysql-query-to-select-all-points-with-in-polygon-using-spatial-indexe – fifonik Mar 21 '19 at 04:08

0 Answers0