0

Articles are parsed from an rss feed and each article might fall into many categories.Also each article has some metadata such as source,upstream etc.

Below is how we are designing the spaces.Each article is inserted into the articles space.


articles space

urlhash | article.content
abcdef | { dummy content}

primary key urlhash = hash(article.url).


In the category_articles space we insert the article multiple times based on how many categories it falls into

category_articles

source | category | urlhash | timestamp
bbc | arts | article1 | 27777
bbc | mobile | article8 | 27777
bbc | phone | article3 | 27778
nyt | sound | article7 | 36667
nyt | speaker | article7 | 45556

primary key = {source, category, urlhash}
secondary key = {source, category, timestamp}

I need latest articles for a given source and a possible category.Below is how I framed the query.

box.space.category_articles.index.secondary:select{{'nyt','speaker'},{ iterator = 'LE', limit = 5 }}

Now I will get article7 twice in the result.Currently I am filtering duplicate results in the code.How can I have distinct(urlhash) kind of option in tarantool.

Community
  • 1
  • 1
crackerplace
  • 5,305
  • 8
  • 34
  • 42

2 Answers2

1

I was able to find a better solution using the pairs function on the index and then filter the articles(track the unique ones using a lua table) until I get the unique number of articles.

index_object:pairs([key[, iterator-type]])

crackerplace
  • 5,305
  • 8
  • 34
  • 42
0

Where is two possible options

  1. The first one is filtering everything at the client side.
  2. The second one is using Lua stored procedure. An example:

    function select_with_distinct() local ca = box.space.category_articles for _, v in pairs(ca.index.secondary:select{ {'nyt','speaker'},{ iterator = 'LE', limit = 5 }}) do -- filtring ... end end

  • Yes,but this is still limited in the sense that out of the 5 articles returned for limit=5,we might have many duplicates.What I was looking for was a better way to specify a limit for unique articles. – crackerplace Jul 13 '17 at 08:19