I have a fairly stable directed graph of order ~100k vertices and size ~1k edges. It is two-dimensional insofar as its vertices can be identified by a pair of integers (x, y)
(of cardinality ~100 x ~1000) and all edges are strictly increasing in x
.
There is furthermore a dictionary of ~1k (key, val)
pairs associated with each vertex.
I am currently storing the graph in a MySQL database across three (InnoDB) tables: a table of vertices (which I don't think is relevant to my question, so I have omitted to include both it and the foreign key constraints that refer to it in my extracts below); a table which holds the dictionaries; and a 'closure table' of connected vertices as described so eloquently by Bill Karwin.
The table of vertex dictionaries is defined as follows:
CREATE TABLE `VertexDictionary` (
`x` smallint(6) unsigned NOT NULL,
`y` smallint(6) unsigned NOT NULL,
`key` varchar(50) NOT NULL DEFAULT '',
`val` smallint(1) DEFAULT NULL,
PRIMARY KEY (`x`, `y` , `key`),
KEY `dict` (`x`, `key`, `val`)
);
and the closure table of connected vertices as:
CREATE TABLE `ConnectedVertices` (
`tail_x` smallint(6) unsigned NOT NULL,
`tail_y` smallint(6) unsigned NOT NULL,
`head_x` smallint(6) unsigned NOT NULL,
`head_y` smallint(6) unsigned NOT NULL,
PRIMARY KEY (`tail_x`, `tail_y`, `head_x`),
KEY `reverse` (`head_x`, `head_y`, `tail_x`),
KEY `fx` (`tail_x`, `head_x`),
KEY `rx` (`head_x`, `tail_x`)
);
There is also a dictionary of (x, key)
pairs such that for each such pair, all vertices identified with that x
have within their dictionaries a value for that key
. This dictionary is stored in a fourth table:
CREATE TABLE `SpecialKeys` (
`x` smallint(6) unsigned NOT NULL,
`key` varchar(50) NOT NULL DEFAULT '',
PRIMARY KEY (`x`),
KEY `xkey` (`x`, `key`)
);
I often wish to extract the set of keys used in the dictionaries of all vertices having a particular x=X
, together with the associated value of any SpecialKeys
connected to the left:
SELECT DISTINCT
`v`.`key`,
`u`.`val`
FROM
`ConnectedVertices` AS `c`
JOIN `VertexDictionary` AS `u` ON (`u`.`x`, `u`.`y` ) = (`c`.`tail_x`, `c`.`tail_y`)
JOIN `VertexDictionary` AS `v` ON (`v`.`x`, `v`.`y` ) = (`c`.`head_x`, `c`.`head_y`)
JOIN `SpecialKeys` AS `k` ON (`k`.`x`, `k`.`key`) = (`u`.`x`, `u`.`key`)
WHERE
`v`.`x` = X
;
for which the EXPLAIN
output is:
id select_type table type possible_keys key key_len ref rows Extra 1 SIMPLE k index PRIMARY,xkey xkey 154 NULL 40 Using index; Using temporary 1 SIMPLE c ref PRIMARY,reverse,fx,rx PRIMARY 2 db.k.x 1 Using where 1 SIMPLE v ref PRIMARY,dict PRIMARY 4 const,db.c.head_y 136 Using index 1 SIMPLE u eq_ref PRIMARY,dict PRIMARY 156 db.c.tail_x,db.c.tail_y,db.k.key 1 Using where
But this query takes ~10s to complete. Been banging my head against a brick wall trying to improve matters, but to no avail.
Can the query be improved, or should I consider a different data structure? Extremely grateful for your thoughts!
UPDATE
I'm still getting nowhere with this, although I did rebuild the tables and found the EXPLAIN
output to be slightly different (as now shown above, the number of rows fetched from v
had increased from 1 to 136!); the query is still taking ~10s to execute.
I really don't understand what's going on here. Queries to obtain all (x, y, SpecialValue)
and all (x, y, key)
tuples are both very fast (~30ms and ~150ms respectively), yet essentially joining the two takes over fifty times longer than their combined time... how can I improve the time taken to perform that join?
Output of SHOW VARIABLES LIKE '%innodb%';
below:
Variable_name Value ------------------------------------------------------------ have_innodb YES ignore_builtin_innodb ON innodb_adaptive_flushing ON innodb_adaptive_hash_index ON innodb_additional_mem_pool_size 2097152 innodb_autoextend_increment 8 innodb_autoinc_lock_mode 1 innodb_buffer_pool_size 1179648000 innodb_change_buffering inserts innodb_checksums ON innodb_commit_concurrency 0 innodb_concurrency_tickets 500 innodb_data_file_path ibdata1:10M:autoextend innodb_data_home_dir /rdsdbdata/db/innodb innodb_doublewrite ON innodb_fast_shutdown 1 innodb_file_format Antelope innodb_file_format_check Barracuda innodb_file_per_table ON innodb_flush_log_at_trx_commit 1 innodb_flush_method O_DIRECT innodb_force_recovery 0 innodb_io_capacity 200 innodb_lock_wait_timeout 50 innodb_locks_unsafe_for_binlog OFF innodb_log_buffer_size 8388608 innodb_log_file_size 134217728 innodb_log_files_in_group 2 innodb_log_group_home_dir /rdsdbdata/log/innodb innodb_max_dirty_pages_pct 75 innodb_max_purge_lag 0 innodb_mirrored_log_groups 1 innodb_old_blocks_pct 37 innodb_old_blocks_time 0 innodb_open_files 300 innodb_read_ahead_threshold 56 innodb_read_io_threads 4 innodb_replication_delay 0 innodb_rollback_on_timeout OFF innodb_spin_wait_delay 6 innodb_stats_method nulls_equal innodb_stats_on_metadata ON innodb_stats_sample_pages 8 innodb_strict_mode OFF innodb_support_xa ON innodb_sync_spin_loops 30 innodb_table_locks ON innodb_thread_concurrency 0 innodb_thread_sleep_delay 10000 innodb_use_sys_malloc ON innodb_version 1.0.16 innodb_write_io_threads 4