0

Table for explanation:

CREATE TABLE test
(
    id    INT,
    uniuque_string      VARCHAR,
    another_id INT,
    PRIMARY KEY ((id, uniuque_string))
);

Sometimes it is necessary to make such requests:

SELECT * FROM test WHERE another_id = 12;

but another_id is not a primary key.

How can I create a hash index (for example, like in mysql) by this column?

Gregory Sysoev
  • 173
  • 1
  • 15

1 Answers1

0

Scylla and Cassandra have two features that fit your needs secondary index and materialized view:

You enable a secondary index on another_id with the command:

CREATE INDEX ON test (another_id);

And now your query SELECT * FROM test WHERE another_id = 12; will work normally. You should be aware that if this SELECT returns a huge number of results (a so-called low-cardinality index) this select is inefficient - it retrieves just the keys from the index, and to show everything (the SELECT *) Scylla needs to go back to the table and fetch its rows one by one, a fairly inefficient process.

The second alternative is a materialized view with another_id as a partition key, as in the command:

CREATE MATERIALIZED VIEW test_by_another_id AS
    SELECT * FROM test
    WHERE another_id IS NOT NULL AND uniuque_string IS NOT NULL and id IS NOT NULL
    PRIMARY KEY (another_id, id, uniuque_string);

This materialized view will contain a copy of all the data (not just keys) from test, and can be searched by another key. For example: SELECT * FROM test_by_another_id WHERE another_id = 12; (note we had to replace test by the viewtest_by_another_id.

The materialized-view solution will make long selects much more efficient than the secondary-index solution, but it requires more storage (all the data is stored twice), so both solutions should be considered depending on your use case.

Nadav Har'El
  • 11,785
  • 1
  • 24
  • 45
  • Table is too huge for materialized view. – Gregory Sysoev Dec 23 '22 at 02:46
  • What storage structure will the secondary index have? – Gregory Sysoev Dec 23 '22 at 02:49
  • What makes a table "too huge for a materialized view"? At worst - if you copy all columns to the view, it will double the table size. Whether or not this is worth it depends on your access pattern. A secondary index is implemented differently in Scylla and Cassandra (and Cassandra has several different implementations). But conceptually, it is a materialized view of just the indexed column and the keys, and nothing else - if you want to read additional columns while scanning the index, the result will be reads from the base table. – Nadav Har'El Dec 23 '22 at 22:55