I couldn't find a similar question on SO so my apologies if this is a duplicate, but I have 2 seemingly equivalent queries, however they're not executed in the same way which is detrimental for query times.
The first query executes in a matter of a couple of milliseconds, the second I stop after waiting for 15 seconds. The table contains ten's of millions of records.
Below I provide both queries and I'm aware I can force the index to be used on the column "appln_id", but I'm just surprised I even have to force it.
MariaDB [patstat]> explain select * from tls201_appln where appln_id in (1465778,1517002,1);
+------+-------------+--------------+------+---------------+------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+--------------+------+---------------+------+---------+------+------+-------------+
| 1 | SIMPLE | tls201_appln | ALL | PRIMARY | NULL | NULL | NULL | 1 | Using where |
+------+-------------+--------------+------+---------------+------+---------+------+------+-------------+
1 row in set (0.000 sec)
MariaDB [patstat]> explain select * from tls201_appln where appln_id in (1465778,1517002);
+------+-------------+--------------+-------+---------------+---------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+--------------+-------+---------------+---------+---------+------+------+-------------+
| 1 | SIMPLE | tls201_appln | range | PRIMARY | PRIMARY | 4 | NULL | 1 | Using where |
+------+-------------+--------------+-------+---------------+---------+---------+------+------+-------------+
1 row in set (0.000 sec)
The first one has 3 elements in the "WHERE IN" statement and to my understanding will force a full table scan "type = ALL". If I lower the amount of arguments to just 2 instead of 3, the query becomes highly optimized as the type = RANGE and will use the primary key.
Any idea on why the number matters? The appln_id is the PRIMARY KEY of the table, so no other index has been made since PRIMARY keys are indexed by default:
MariaDB [patstat]> show columns from tls201_appln;
+-----------------------+------------------+------+-----+------------+-------+
| Field | Type | Null | Key | Default | Extra |
+-----------------------+------------------+------+-----+------------+-------+
| APPLN_ID | int(10) unsigned | NO | PRI | NULL | |
Another odd thing that might be relating to the same optimisation issue is that a "select *" (0.000s) is much faster than a "select appln_id" (3min50s) on the same table.
When doing the select column query, I get the following explain:
+------+-------------+--------------+-------+---------------+---------+---------+------+------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+--------------+-------+---------------+---------+---------+------+------+--------------------------+
| 1 | SIMPLE | tls201_appln | index | PRIMARY | PRIMARY | 4 | NULL | 1 | Using where; Using index |
+------+-------------+--------------+-------+---------------+---------+---------+------+------+--------------------------+
When doing the **select *** query, I get the following explain
+------+-------------+--------------+-------+---------------+---------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+--------------+-------+---------------+---------+---------+------+------+-------------+
| 1 | SIMPLE | tls201_appln | range | PRIMARY | PRIMARY | 4 | NULL | 1 | Using where |
+------+-------------+--------------+-------+---------------+---------+---------+------+------+-------------+
Somehow using the index, or making it selecting a specific column is bumping the query speed like mad as described above.
Some peripherals:
- HD usage for the entire dataset on MariaDB is about 500GB
- It's a disk, not an SSD drive
The full table create statement is the following
CREATE TABLE `tls201_appln` (
`APPLN_ID` int(10) unsigned NOT NULL,
`APPLN_AUTH` char(2) NOT NULL DEFAULT '',
`APPLN_NR` varchar(20) NOT NULL DEFAULT '',
`APPLN_KIND` char(2) DEFAULT '',
`APPLN_FILING_DATE` date DEFAULT NULL,
`APPLN_FILING_YEAR` int(4) DEFAULT NULL,
`APPLN_NR_EPODOC` varchar(20) NOT NULL DEFAULT '',
`appln_nr_original` varchar(100) NOT NULL DEFAULT '',
`IPR_TYPE` char(2) DEFAULT '',
`RECEIVING_OFFICE` char(2) NOT NULL DEFAULT '',
`INTERNAT_APPLN_ID` int(11) DEFAULT NULL,
`int_phase` char(1) NOT NULL DEFAULT 'N',
`reg_phase` char(1) NOT NULL DEFAULT 'N',
`nat_phase` char(1) NOT NULL DEFAULT 'N',
`EARLIEST_FILING_DATE` date NOT NULL DEFAULT '9999-12-31',
`EARLIEST_FILING_YEAR` int(4) NOT NULL DEFAULT 9999,
`EARLIEST_FILING_ID` int(11) NOT NULL DEFAULT 0,
`EARLIEST_PUBLN_DATE` date NOT NULL DEFAULT '9999-12-31',
`EARLIEST_PUBLN_YEAR` int(4) NOT NULL DEFAULT 9999,
`EARLIEST_PAT_PUBLN_ID` int(11) NOT NULL DEFAULT 0,
`GRANTED` char(1) NOT NULL DEFAULT '',
`DOCDB_FAMILY_ID` int(11) NOT NULL DEFAULT 0,
`INPADOC_FAMILY_ID` int(11) NOT NULL DEFAULT 0,
`DOCDB_FAMILY_SIZE` int(4) NOT NULL DEFAULT 0,
`NB_CITING_DOCDB_FAM` int(4) NOT NULL DEFAULT 0,
`NB_APPLICANTS` int(4) NOT NULL DEFAULT 0,
`NB_INVENTORS` int(4) NOT NULL DEFAULT 0,
PRIMARY KEY (`APPLN_ID`),
KEY `IDX201_DOCDB` (`DOCDB_FAMILY_ID`),
KEY `IDX201_INPAD` (`INPADOC_FAMILY_ID`),
KEY `IDX602_YEAR` (`APPLN_FILING_YEAR`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 |
EDIT
The queries explain in a JSON format instead of the default one:
explain format=json select * from tls201_appln where appln_id in (1465778,1517002,1);
{
"query_block": {
"select_id": 1,
"table": {
"table_name": "tls201_appln",
"access_type": "ALL",
"possible_keys": ["PRIMARY"],
"rows": 1,
"filtered": 100,
"attached_condition": "tls201_appln.APPLN_ID in (1465778,1517002,1)"
}
}
}
explain format=json select * from tls201_appln where appln_id in (1465778,1517002);
{
"query_block": {
"select_id": 1,
"table": {
"table_name": "tls201_appln",
"access_type": "range",
"possible_keys": ["PRIMARY"],
"key": "PRIMARY",
"key_length": "4",
"used_key_parts": ["APPLN_ID"],
"rows": 1,
"filtered": 100,
"attached_condition": "tls201_appln.APPLN_ID in (1465778,1517002)"
}
}
}