0

I couldn't find a similar question on SO so my apologies if this is a duplicate, but I have 2 seemingly equivalent queries, however they're not executed in the same way which is detrimental for query times.

The first query executes in a matter of a couple of milliseconds, the second I stop after waiting for 15 seconds. The table contains ten's of millions of records.

Below I provide both queries and I'm aware I can force the index to be used on the column "appln_id", but I'm just surprised I even have to force it.

MariaDB [patstat]> explain select * from tls201_appln where appln_id in (1465778,1517002,1);
+------+-------------+--------------+------+---------------+------+---------+------+------+-------------+
| id   | select_type | table        | type | possible_keys | key  | key_len | ref  | rows | Extra       |
+------+-------------+--------------+------+---------------+------+---------+------+------+-------------+
|    1 | SIMPLE      | tls201_appln | ALL  | PRIMARY       | NULL | NULL    | NULL |    1 | Using where |
+------+-------------+--------------+------+---------------+------+---------+------+------+-------------+
1 row in set (0.000 sec)

MariaDB [patstat]> explain select * from tls201_appln where appln_id in (1465778,1517002);
+------+-------------+--------------+-------+---------------+---------+---------+------+------+-------------+
| id   | select_type | table        | type  | possible_keys | key     | key_len | ref  | rows | Extra       |
+------+-------------+--------------+-------+---------------+---------+---------+------+------+-------------+
|    1 | SIMPLE      | tls201_appln | range | PRIMARY       | PRIMARY | 4       | NULL |    1 | Using where |
+------+-------------+--------------+-------+---------------+---------+---------+------+------+-------------+
1 row in set (0.000 sec)

The first one has 3 elements in the "WHERE IN" statement and to my understanding will force a full table scan "type = ALL". If I lower the amount of arguments to just 2 instead of 3, the query becomes highly optimized as the type = RANGE and will use the primary key.

Any idea on why the number matters? The appln_id is the PRIMARY KEY of the table, so no other index has been made since PRIMARY keys are indexed by default:

MariaDB [patstat]> show columns from tls201_appln;
+-----------------------+------------------+------+-----+------------+-------+
| Field                 | Type             | Null | Key | Default    | Extra |
+-----------------------+------------------+------+-----+------------+-------+
| APPLN_ID              | int(10) unsigned | NO   | PRI | NULL       |       |

Another odd thing that might be relating to the same optimisation issue is that a "select *" (0.000s) is much faster than a "select appln_id" (3min50s) on the same table.

When doing the select column query, I get the following explain:

+------+-------------+--------------+-------+---------------+---------+---------+------+------+--------------------------+
| id   | select_type | table        | type  | possible_keys | key     | key_len | ref  | rows | Extra                    |
+------+-------------+--------------+-------+---------------+---------+---------+------+------+--------------------------+
|    1 | SIMPLE      | tls201_appln | index | PRIMARY       | PRIMARY | 4       | NULL |    1 | Using where; Using index |
+------+-------------+--------------+-------+---------------+---------+---------+------+------+--------------------------+

When doing the **select *** query, I get the following explain

+------+-------------+--------------+-------+---------------+---------+---------+------+------+-------------+
| id   | select_type | table        | type  | possible_keys | key     | key_len | ref  | rows | Extra       |
+------+-------------+--------------+-------+---------------+---------+---------+------+------+-------------+
|    1 | SIMPLE      | tls201_appln | range | PRIMARY       | PRIMARY | 4       | NULL |    1 | Using where |
+------+-------------+--------------+-------+---------------+---------+---------+------+------+-------------+

Somehow using the index, or making it selecting a specific column is bumping the query speed like mad as described above.

Some peripherals:

  • HD usage for the entire dataset on MariaDB is about 500GB
  • It's a disk, not an SSD drive

The full table create statement is the following

 CREATE TABLE `tls201_appln` (
  `APPLN_ID` int(10) unsigned NOT NULL,
  `APPLN_AUTH` char(2) NOT NULL DEFAULT '',
  `APPLN_NR` varchar(20) NOT NULL DEFAULT '',
  `APPLN_KIND` char(2) DEFAULT '',
  `APPLN_FILING_DATE` date DEFAULT NULL,
  `APPLN_FILING_YEAR` int(4) DEFAULT NULL,
  `APPLN_NR_EPODOC` varchar(20) NOT NULL DEFAULT '',
  `appln_nr_original` varchar(100) NOT NULL DEFAULT '',
  `IPR_TYPE` char(2) DEFAULT '',
  `RECEIVING_OFFICE` char(2) NOT NULL DEFAULT '',
  `INTERNAT_APPLN_ID` int(11) DEFAULT NULL,
  `int_phase` char(1) NOT NULL DEFAULT 'N',
  `reg_phase` char(1) NOT NULL DEFAULT 'N',
  `nat_phase` char(1) NOT NULL DEFAULT 'N',
  `EARLIEST_FILING_DATE` date NOT NULL DEFAULT '9999-12-31',
  `EARLIEST_FILING_YEAR` int(4) NOT NULL DEFAULT 9999,
  `EARLIEST_FILING_ID` int(11) NOT NULL DEFAULT 0,
  `EARLIEST_PUBLN_DATE` date NOT NULL DEFAULT '9999-12-31',
  `EARLIEST_PUBLN_YEAR` int(4) NOT NULL DEFAULT 9999,
  `EARLIEST_PAT_PUBLN_ID` int(11) NOT NULL DEFAULT 0,
  `GRANTED` char(1) NOT NULL DEFAULT '',
  `DOCDB_FAMILY_ID` int(11) NOT NULL DEFAULT 0,
  `INPADOC_FAMILY_ID` int(11) NOT NULL DEFAULT 0,
  `DOCDB_FAMILY_SIZE` int(4) NOT NULL DEFAULT 0,
  `NB_CITING_DOCDB_FAM` int(4) NOT NULL DEFAULT 0,
  `NB_APPLICANTS` int(4) NOT NULL DEFAULT 0,
  `NB_INVENTORS` int(4) NOT NULL DEFAULT 0,
  PRIMARY KEY (`APPLN_ID`),
  KEY `IDX201_DOCDB` (`DOCDB_FAMILY_ID`),
  KEY `IDX201_INPAD` (`INPADOC_FAMILY_ID`),
  KEY `IDX602_YEAR` (`APPLN_FILING_YEAR`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 |

EDIT

The queries explain in a JSON format instead of the default one:

explain format=json select * from tls201_appln where appln_id in (1465778,1517002,1);

{
  "query_block": {
    "select_id": 1,
    "table": {
      "table_name": "tls201_appln",
      "access_type": "ALL",
      "possible_keys": ["PRIMARY"],
      "rows": 1,
      "filtered": 100,
      "attached_condition": "tls201_appln.APPLN_ID in (1465778,1517002,1)"
    }
  }
} 
explain format=json select * from tls201_appln where appln_id in (1465778,1517002);
{
  "query_block": {
    "select_id": 1,
    "table": {
      "table_name": "tls201_appln",
      "access_type": "range",
      "possible_keys": ["PRIMARY"],
      "key": "PRIMARY",
      "key_length": "4",
      "used_key_parts": ["APPLN_ID"],
      "rows": 1,
      "filtered": 100,
      "attached_condition": "tls201_appln.APPLN_ID in (1465778,1517002)"
    }
  }
}
jlos
  • 1,010
  • 1
  • 8
  • 12
  • I cannot reproduce this behavior in MariaDB 5.5.68. – KIKO Software Aug 28 '21 at 09:38
  • Hi @KIKOSoftware, I'm using Mariadb10.3 – jlos Aug 28 '21 at 09:39
  • You say: "which is detrimental for query times". It would be useful to give an clear example of this. I don't mean a whole dataset to run the queries on, but just an example of a table, how many rows, and the difference in time between the two queries. – KIKO Software Aug 28 '21 at 09:43
  • Good question, I'll update it in my description. – jlos Aug 28 '21 at 09:49
  • @KIKOSoftware I updated my post to answer your questions, which are very valid and I'll keep that in mind for new posts. – jlos Aug 28 '21 at 09:53
  • Ah, yes, that's indeed a very large difference. These are such simple queries. I have no idea why the obvious index is dropped by the optimizer. You have chosen a third value that quite far removed from the first two. Does it do the same thing if the third value is close, for instance 1567403? – KIKO Software Aug 28 '21 at 10:01
  • I've done some further analysis and the big differentiator, when I force the index, which is something I can live with, is the select * versus a select amount of columns. I'll update the ticket to make that clear – jlos Aug 28 '21 at 10:04
  • 1
    This might very well be an resources issue. You might want to look into partitioning the table, see: https://mariadb.com/kb/en/partitioning-tables/ – KIKO Software Aug 28 '21 at 10:08
  • Please provide `EXPLAIN FORMAT=JSON SELECT ...` for each case. – Rick James Aug 29 '21 at 03:39
  • @RickJames I added them in an EDIT section at the bottom of the post, is this just for easier reading? – jlos Aug 30 '21 at 06:52

0 Answers0