Our use case is as below
We will be getting articles from a source.
Each article has certain metadata.The metadata has a field categories which is a list of category names(mobiles,laptops etc) that the article might fall into.Similarly the metadata has a keywords field with a list of keywords that the article might fall into.
Below are the query's that we might need
a) Get all articles for a given category and are from a certain source,location.Source and location are metadata fields of the article.
b) Get all articles for a given category,source,location.
c) Get all articles for a given category,location.
d) Get all articles for a given keyword,category,location.
Below is how we are designing the spaces.So upon getting each article,we insert the article into articles space and also insert into category_articles and keyword_articles space based on which categories and keywords this article falls into.
Below are few queries I have on how to efficiently create the secondary indexes for the given query patterns.
a)For a secondary key {category,source,location},if we miss certain parts in the input value ,then will the rows match if other parts of key are equal ? https://tarantool.org/doc/singlehtml.html#lua-data.index_object.parts
I know that we cant specify 1st and 3rd part in input i.e {'category','location'} for the key ? or shud we have a nil at second position.
b)Can we model the schema in a better way so as to efficiently query for the given patterns.
articles
urlhash
title
content
url
date
primary key : {urlhash}
--
category_articles
id
category
source
location
{urlhash,..}
primary key : {id}
secondary key : {category,source,location}
--
keyword_articles
id
keyword
category
source
location
{urlhash,..}
primary key : {id}
secondary key : {keyword,category,source,location}
Updates:
Query :
box.space.category_articles.index.secondary_key:select{'bollywoord',nil,'india'}
Ouput :
[1, 'bollywoord', 'ht', 'india', ['url1', 'url2']]
[3, 'bollywoord', 'ht', 'us', ['url3', 'url4']]