I have a table called cities
:
CREATE TABLE `cities` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`city` varchar(255) DEFAULT NULL,
`state_code` varchar(10) DEFAULT NULL,
`country_code` varchar(10) DEFAULT NULL,
`latitude` double DEFAULT NULL,
`longitude` double DEFAULT NULL,
`geom` geometry NOT NULL,
PRIMARY KEY (`id`) USING BTREE,
KEY `country_code` (`country_code`),
SPATIAL KEY `geom` (`geom`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
It contains around 4000 rows, none of which have the state_code
field populated. I'm trying to fill that field using data I have in another table called geodata
:
CREATE TABLE `geodata` (
`region_id` bigint(20) NOT NULL AUTO_INCREMENT,
`country_code` varchar(10) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL DEFAULT '',
`state_code` varchar(10) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL DEFAULT '',
`geom` geometry NOT NULL,
PRIMARY KEY (`region_id`),
KEY `country_code` (`country_code`),
KEY `state_code` (`state_code`),
SPATIAL KEY `geom` (`geom`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
That table has ~1500 rows with the polygon data for each of the US states and Canadian provinces (some states have more than one entry because the source data broke it up into multiple geometries).
I'm trying to find the most efficient query for looking up the cities.geom
point within all the geodata.geom
polygons, and then updating cities.state_code
with the appropriate data from the geodata
table.
Just as a test (SELECTing, not UPDATEing), I tried this:
SELECT
C.cityid_ppn,
C.city,
C.latitude,
C.longitude,
G.state_code,
G.country_code
FROM cities C
LEFT JOIN geodata G ON (
C.country_code = G.country_code
AND ST_WITHIN(C.geom, G.geom)
)
WHERE
G.country_code='CA'
AND g.state_code='ON'
;
Even with this subset of data, the query takes 9 seconds. Trying to run it on the full set (i.e. without the WHERE ...
clause above), I gave up after waiting 5 minutes.
What am I doing wrong? And what would be the best way to do the update of cities
?