1

I have a table with 2 million rows. I have created index on (expression of an unique column). That expression will map the column's value to a lower number space, meaning the outcome of that expression is not unique. Now I want to create index on (DISTINCT of that expression). The reason behind my wanting to create index again on the expression is that DISTINCT and GROUP BY queries run slow (~900ms).

My need is this,

CREATE INDEX idx ON mytable(DISTINCT myColumn|9223372036854775806);       //invalid

What I have already is,

CREATE INDEX idx ON mytable(myColumn|9223372036854775806);

My ultimate goal is to make the following query faster(<100ms)

SELECT id FROM mytable GROUP BY myColumn|9223372036854775806;

Currently it takes 900ms to execute the above query

PoomaniGP
  • 80
  • 1
  • 1
  • 11

1 Answers1

0

DISTINCT can only be used as part of a SELECT i.e. it removes duplicate rows.

DISTINCT's complement is ALL which is the default and hence why you rarely see ALL.

An Index has to include an entry for all rows that it covers so DISTINCT would contravene the purpose of the index as some of the rows covered would be omitted.

What CREATE INDEX idx ON mytable(myColumn|9223372036854775806); is doing is splitting the index into two primary groups/branches as |9223372036854775806 results in either 9223372036854775806 or 9223372036854775807, as such DISTINCT would result in 2 rows (typically) e.g. using

SELECT DISTINCT mycolumn,myColumn|9223372036854775806 AS x FROM mytable;

results in (from a table populated with random values with 1 Million rows as per tesing code below) :-

enter image description here

  • i.e 2 rows irrespective of the number of rows
    • (unless the number is very low and all values of the myColumn|9223372036854775806 happend to all be the same result when there would only be 1 row).

Partial Indexes

A Partial index, is one in which a WHERE clause is used to reduce the indexed rows, the index is selected if the query meets one of 2 rules. Partial Indexes - 3. Queries Using Partial Indexes

Theorectically you would use (to get just 2 rows indxed) :-

CREATE INDEX idx3 ON mytable(myColumn) WHERE myColumn =  (SELECT DISTINCT myColumn|9223372036854775806 FROM mytable);
  • However, the above would fail with subqueries prohibited in partial index WHERE clauses

As such you appear to want to ascertain a means of creating indexes e.g. you could split into 10 partial indexes using :-

CREATE INDEX idx10 ON mytable(mycolumn) WHERE mycolumn < 9223372036854775806 / 10;
CREATE INDEX idx20 ON mytable(mycolumn) WHERE mycolumn >= 9223372036854775806 / 10 AND mycolumn < 9223372036854775806 / 20;
CREATE INDEX idx30 ON mytable(mycolumn) WHERE mycolumn >= 9223372036854775806 / 20 AND mycolumn < 9223372036854775806 / 30;
...... and so on
CREATE INDEX idx100 ON mytable(mycolumn) WHERE mycolumn >= 9223372036854775806 / 90;

In which case :-

EXPLAIN QUERY PLAN
SELECT mycolumn FROM mytable WHERE mycolumn < 9223372036854775806 / 10 LIMIT 100;

Results in :-

enter image description here

  • i.e idx10 has been chosen by the query planner as it has determined that the SELECT's WHERE clause suits using index ix10.

  • Note the above is solely an example of partial indexes, it may well not be suitable for your outcome.

Testing Code

The above was partly generated from testing using the following code, which you may find helpful :-

DROP TABLE IF EXISTS mytable;
DROP INDEX IF EXISTS idx1;
DROP INDEX IF EXISTS idx2;
DROP INDEX IF EXISTS idx10;
DROP INDEX IF EXISTS idx20;
DROP INDEX IF EXISTS idx30;
DROP INDEX IF EXISTS idx40;
DROP INDEX IF EXISTS idx50;
DROP INDEX IF EXISTS idx60;
DROP INDEX IF EXISTS idx70;
DROP INDEX IF EXISTS idx80;
DROP INDEX IF EXISTS idx90;
DROP INDEX IF EXISTS idx100;
CREATE TABLE IF NOT EXISTS mytable (id INTEGER PRIMARY KEY, mycolumn INTEGER DEFAULT (CAST(abs(random()) AS INTEGER)));
CREATE INDEX idx1 ON mytable(myColumn|9223372036854775806);
-- CREATE INDEX idx2 ON mytable(DISTINCT myColumn|9223372036854775806); CANNOT BE USED AS DISTINCT is syntax error.
-- CREATE INDEX idx3 ON mytable(myColumn) WHERE myColumn =  (SELECT DISTINCT myColumn|9223372036854775806 FROM mytable);
CREATE INDEX idx10 ON mytable(mycolumn) WHERE mycolumn < 9223372036854775806 / 10;
CREATE INDEX idx20 ON mytable(mycolumn) WHERE mycolumn >= 9223372036854775806 / 10 AND mycolumn < 9223372036854775806 / 20;
CREATE INDEX idx30 ON mytable(mycolumn) WHERE mycolumn >= 9223372036854775806 / 20 AND mycolumn < 9223372036854775806 / 30;
CREATE INDEX idx40 ON mytable(mycolumn) WHERE mycolumn >= 9223372036854775806 / 30 AND mycolumn < 9223372036854775806 / 40;
CREATE INDEX idx50 ON mytable(mycolumn) WHERE mycolumn >= 9223372036854775806 / 40 AND mycolumn < 9223372036854775806 / 50;
CREATE INDEX idx60 ON mytable(mycolumn) WHERE mycolumn >= 9223372036854775806 / 50 AND mycolumn < 9223372036854775806 / 60;
CREATE INDEX idx70 ON mytable(mycolumn) WHERE mycolumn >= 9223372036854775806 / 60 AND mycolumn < 9223372036854775806 / 70;
CREATE INDEX idx80 ON mytable(mycolumn) WHERE mycolumn >= 9223372036854775806 / 70 AND mycolumn < 9223372036854775806 / 80;
CREATE INDEX idx90 ON mytable(mycolumn) WHERE mycolumn >= 9223372036854775806 / 80 AND mycolumn < 9223372036854775806 / 90;
CREATE INDEX idx100 ON mytable(mycolumn) WHERE mycolumn >= 9223372036854775806 / 90;
/* Load some data */
WITH RECURSIVE cte(x,y) AS (
    SELECT 1,CAST(abs(random()) AS INTEGER)
    UNION ALL SELECT x+1,CAST(abs(random()) AS INTEGER) FROM cte LIMIT 1000000
)
INSERT INTO mytable (mycolumn) SELECT y FROM cte;

/* Show some sample data */
SELECT mycolumn,myColumn|9223372036854775806 AS x FROM mytable LIMIT 100;

/* LOOK AT WHAT THE QUERY PLANNER COMES UP WITH */
EXPLAIN QUERY PLAN
SELECT DISTINCT mycolumn,myColumn|9223372036854775806 AS x FROM mytable;
EXPLAIN QUERY PLAN
SELECT /*(SELECT count() FROM mytable) AS rows,*/*, mycolumn|9223372036854775806 AS x FROM mytable GROUP BY myColumn|9223372036854775806;


SELECT DISTINCT mycolumn,myColumn|9223372036854775806 AS x FROM mytable LIMIT 100;
SELECT /*(SELECT count() FROM mytable) AS rows,*/*, mycolumn|9223372036854775806 AS x FROM mytable GROUP BY myColumn|9223372036854775806;

EXPLAIN QUERY PLAN
SELECT mycolumn FROM mytable WHERE mycolumn < 9223372036854775806 / 10 LIMIT 100;

SELECT mycolumn FROM mytable WHERE mycolumn < 9223372036854775806 / 10 LIMIT 100;

/* CLEAN UP */
DROP TABLE IF EXISTS mytable;
DROP INDEX IF EXISTS idx1;
DROP INDEX IF EXISTS idx2;
DROP INDEX IF EXISTS idx10;
DROP INDEX IF EXISTS idx20;
DROP INDEX IF EXISTS idx30;
DROP INDEX IF EXISTS idx40;
DROP INDEX IF EXISTS idx50;
DROP INDEX IF EXISTS idx60;
DROP INDEX IF EXISTS idx70;
DROP INDEX IF EXISTS idx80;
DROP INDEX IF EXISTS idx90;
DROP INDEX IF EXISTS idx100;
  • Note the above takes some 20 seconds or so to run, but that will depend upon the device on which it has been run.

The following is an example of the messages and timings from a run of the above :-

DROP TABLE IF EXISTS mytable
> OK
> Time: 1.173s


DROP INDEX IF EXISTS idx1
> OK
> Time: 0s


DROP INDEX IF EXISTS idx2
> OK
> Time: 0s


DROP INDEX IF EXISTS idx10
> OK
> Time: 0s


DROP INDEX IF EXISTS idx20
> OK
> Time: 0s


DROP INDEX IF EXISTS idx30
> OK
> Time: 0s


DROP INDEX IF EXISTS idx40
> OK
> Time: 0s


DROP INDEX IF EXISTS idx50
> OK
> Time: 0s


DROP INDEX IF EXISTS idx60
> OK
> Time: 0s


DROP INDEX IF EXISTS idx70
> OK
> Time: 0s


DROP INDEX IF EXISTS idx80
> OK
> Time: 0s


DROP INDEX IF EXISTS idx90
> OK
> Time: 0s


DROP INDEX IF EXISTS idx100
> OK
> Time: 0s


CREATE TABLE IF NOT EXISTS mytable (id INTEGER PRIMARY KEY, mycolumn INTEGER DEFAULT (CAST(abs(random()) AS INTEGER)))
> OK
> Time: 0.056s


CREATE INDEX idx1 ON mytable(myColumn|9223372036854775806)
> OK
> Time: 0.024s


-- CREATE INDEX idx2 ON mytable(DISTINCT myColumn|9223372036854775806); CANNOT BE USED AS DISTINCT is syntax error.
-- CREATE INDEX idx3 ON mytable(myColumn) WHERE myColumn =  (SELECT DISTINCT myColumn|9223372036854775806 FROM mytable);
CREATE INDEX idx10 ON mytable(mycolumn) WHERE mycolumn < 9223372036854775806 / 10
> OK
> Time: 0.024s


CREATE INDEX idx20 ON mytable(mycolumn) WHERE mycolumn >= 9223372036854775806 / 10 AND mycolumn < 9223372036854775806 / 20
> OK
> Time: 0.024s


CREATE INDEX idx30 ON mytable(mycolumn) WHERE mycolumn >= 9223372036854775806 / 20 AND mycolumn < 9223372036854775806 / 30
> OK
> Time: 0.024s


CREATE INDEX idx40 ON mytable(mycolumn) WHERE mycolumn >= 9223372036854775806 / 30 AND mycolumn < 9223372036854775806 / 40
> OK
> Time: 0.024s


CREATE INDEX idx50 ON mytable(mycolumn) WHERE mycolumn >= 9223372036854775806 / 40 AND mycolumn < 9223372036854775806 / 50
> OK
> Time: 0.024s


CREATE INDEX idx60 ON mytable(mycolumn) WHERE mycolumn >= 9223372036854775806 / 50 AND mycolumn < 9223372036854775806 / 60
> OK
> Time: 0.024s


CREATE INDEX idx70 ON mytable(mycolumn) WHERE mycolumn >= 9223372036854775806 / 60 AND mycolumn < 9223372036854775806 / 70
> OK
> Time: 0.024s


CREATE INDEX idx80 ON mytable(mycolumn) WHERE mycolumn >= 9223372036854775806 / 70 AND mycolumn < 9223372036854775806 / 80
> OK
> Time: 0.024s


CREATE INDEX idx90 ON mytable(mycolumn) WHERE mycolumn >= 9223372036854775806 / 80 AND mycolumn < 9223372036854775806 / 90
> OK
> Time: 0.024s


CREATE INDEX idx100 ON mytable(mycolumn) WHERE mycolumn >= 9223372036854775806 / 90
> OK
> Time: 0.024s


/* Load some data */
WITH RECURSIVE cte(x,y) AS (
    SELECT 1,CAST(abs(random()) AS INTEGER)
    UNION ALL SELECT x+1,CAST(abs(random()) AS INTEGER) FROM cte LIMIT 1000000
)
INSERT INTO mytable (mycolumn) SELECT y FROM cte
> Affected rows: 1000000
> Time: 14.096s


/* Show some sample data */
SELECT mycolumn,myColumn|9223372036854775806 AS x FROM mytable LIMIT 100
> OK
> Time: 0.001s


/* LOOK AT WHAT THE QUERY PLANNER COMES UP WITH */
EXPLAIN QUERY PLAN
SELECT DISTINCT mycolumn,myColumn|9223372036854775806 AS x FROM mytable
> OK
> Time: 0s


EXPLAIN QUERY PLAN
SELECT /*(SELECT count() FROM mytable) AS rows,*/*, mycolumn|9223372036854775806 AS x FROM mytable GROUP BY myColumn|9223372036854775806
> OK
> Time: 0s


SELECT DISTINCT mycolumn,myColumn|9223372036854775806 AS x FROM mytable LIMIT 100
> OK
> Time: 0.001s


SELECT /*(SELECT count() FROM mytable) AS rows,*/*, mycolumn|9223372036854775806 AS x FROM mytable GROUP BY myColumn|9223372036854775806
> OK
> Time: 0.093s


EXPLAIN QUERY PLAN
SELECT mycolumn FROM mytable WHERE mycolumn < 9223372036854775806 / 10 LIMIT 100
> OK
> Time: 0s


SELECT mycolumn FROM mytable WHERE mycolumn < 9223372036854775806 / 10 LIMIT 100
> OK
> Time: 0s


/* CLEAN UP */
DROP TABLE IF EXISTS mytable
> OK
> Time: 0.793s


DROP INDEX IF EXISTS idx1
> OK
> Time: 0s


DROP INDEX IF EXISTS idx2
> OK
> Time: 0s


DROP INDEX IF EXISTS idx10
> OK
> Time: 0s


DROP INDEX IF EXISTS idx20
> OK
> Time: 0s


DROP INDEX IF EXISTS idx30
> OK
> Time: 0s


DROP INDEX IF EXISTS idx40
> OK
> Time: 0s


DROP INDEX IF EXISTS idx50
> OK
> Time: 0s


DROP INDEX IF EXISTS idx60
> OK
> Time: 0s


DROP INDEX IF EXISTS idx70
> OK
> Time: 0s


DROP INDEX IF EXISTS idx80
> OK
> Time: 0s


DROP INDEX IF EXISTS idx90
> OK
> Time: 0s


DROP INDEX IF EXISTS idx100
> OK
> Time: 0s
Community
  • 1
  • 1
MikeT
  • 51,415
  • 16
  • 49
  • 68