Pre Requirement : I need to find all the matching result in one query which is more than 40K results.
Requirement : Two tables - product and product_category. I am trying to fetch all the products with matching category from product_category table.
Table Structure :
CREATE TABLE catalog.product (
product_id string PRIMARY KEY index using plain,
name string,
sku string,
) clustered by (product_id) into 4 shards;
create table catalog.product_category (
category_id string primary key index using plain,
product_id string primary key index using plain,
INDEX product_category_index using plain(product_id, category_id)
active boolean,
parent_category_id integer,
updated_at timestamp
);
Join Query :
select p.product_id from catalog.product_category pc join catalog.product p on p.product_id=pc.product_id limit 40000;
Tried multiple things - indexing product_id (both as integer and string) etc.
Result : To showup 35K result it is taking more than 90 seconds everytime.
Problem : What can I do to optimize the query response time?
Some Other Information : - CPU Core -4 - Tried with one or multiple nodes - Default Sharding - Total number of products - 35K and product_category has 35K enteries only.
Usecase : I am trying to use crateDB as persistent cache but with the given query response time we can't really do that. So we will move to some in-memory database like REDIS or Memcache. The reason to choose crateDB was the querying ability on your persistent data.