1

I can't figure out how to do limit within group although I've read all similar questions here. Reading PSQL doc didn't help either :( Consider the following:

CREATE TABLE article_relationship
(
    article_from INT NOT NULL,
    article_to INT NOT NULL,
    score INT
);

I want to get a list of top 5 related articles per given article IDs sorted by score.

Here is what I tried:

select DISTINCT o.article_from
from article_relationship o
join lateral (
       select i.article_from, i.article_to, i.score from article_relationship i
       order by score desc
       limit 5
       ) p on p.article_from = o.article_from
where o.article_from IN (18329382, 61913904, 66538293, 66540477, 66496909)
order by o.article_from;

And it returns nothing. I was under impression that outer query is like loop so I guess I only need source IDs there.

Also what if I want to join on articles table where there are columns id and title and get titles of related articles in resultset?

I added join in inner query:

select o.id, p.*
from articles o
join lateral (
       select a.title, i.article_from, i.article_to, i.score
       from article_relationship i
       INNER JOIN articles a on a.id = i.article_to
       where i.article_from = o.id
       order by score desc
       limit 5
       ) p on true
where o.id IN (18329382, 61913904, 66538293, 66540477, 66496909)
order by o.id;

But it made it very very slow.

Patrick
  • 29,357
  • 6
  • 62
  • 90
expert
  • 29,290
  • 30
  • 110
  • 214

2 Answers2

1

The problem with no rows returning from your query is that your join condition is wrong: ON p.article_from = o.article_from; this should obviously be ON p.article_from = o.article_to.

That issue aside, your query will not return the top 5 scoring relations per article id; instead it will return the article IDs that reference one of the 5 top rated referenced articles throughout the table and (also) at least 1 of the 5 referenced articles for which you specify the id.

You can get the top 5 rated referenced articles per referencing article with a window function to rank the scores in a sub-select and then select only the top 5 in the main query. Specifying a list of referenced article IDs effectively means that you will rank how these referenced articles are scored for each referencing article:

SELECT article_from, article_to, score
FROM (
    SELECT article_from, article_to, score, 
           rank() OVER (PARTITION BY article_from ORDER BY score DESC) AS rnk
    FROM article_relationship
    WHERE article_to IN (18329382, 61913904, 66538293, 66540477, 66496909) ) a
WHERE rnk < 6
ORDER BY article_from, score DESC;

This is different from your code in that it returns up to 5 records for each article_from but it is consistent with your initial description.

Adding columns from table articles is trivially done in the main query:

SELECT a.article_from, a.article_to, a.score, articles.*
FROM (
    SELECT article_from, article_to, score, 
           rank() OVER (PARTITION BY article_from ORDER BY score DESC) AS rnk
    FROM article_relationship
    WHERE article_to IN (18329382, 61913904, 66538293, 66540477, 66496909) ) a
JOIN articles ON articles.id = a.article_to
WHERE a.rnk < 6
ORDER BY a.article_from, a.score DESC;
Patrick
  • 29,357
  • 6
  • 62
  • 90
  • Thank you, Patrick! I also posted join lateral version I came up with last night. Could you please tell me pros/cons of your version vs. lateral join ? In my measurements your version is ~15% faster. – expert Dec 22 '16 at 12:40
  • 1
    Glad to be of help. I can not tell why my version would be faster, though. The approach is rather different and changes in speed can also be caused by index use. In general, though, window functions align very closely with the relational data model (working with sets) so these functions may be better fitted to how the data is structured internally. But even for experienced SQL developers, efficiency is sometimes more of a Black Art than science! – Patrick Dec 23 '16 at 03:25
1

Version with join lateral

select o.id as from_id, p.article_to as to_id, a.title, a.journal_id, a.pub_date_p from articles o
  join lateral (
       select i.article_to from article_relationship i
       where i.article_from = o.id
       order by score desc
       limit 5
       ) p on true
  INNER JOIN articles a on a.id = p.article_to
where o.id IN (18329382, 61913904, 66538293, 66540477, 66496909)
order by o.id;
expert
  • 29,290
  • 30
  • 110
  • 214