-1

I have the database consists of 4 tables:

  1. users(id, "name", surname, birthdate)
  2. friendships(userid1, userid2, "timestamp")
  3. posts(id, userid, "text", "timestamp")
  4. likes(postid, userid, "timestamp")

I need to get a result set of unique usernames having more than 3 friendships within January of 2018 and their "likes" average per "post" in the range of [10; 35).

I wrote this statement for the first step:

select  distinct u."name"
from users u
join friendships f on u.id = f.userid1
where f."timestamp" between '2018-01-01'::timestamp and '2018-01-31'::timestamp
group by u.id
having count(f.userid1) > 3;

It's working fine and returns 3 rows. But when I'm adding the second part this way:

select  distinct u."name"
from users u
join friendships f on u.id = f.userid1
join posts p on p.userid = u.id
join likes l on p.id = l.postid
where f."timestamp" between '2018-01-01'::timestamp and '2018-01-31'::timestamp
group by u.id
having count(f.userid1) > 3 
    and ((count(l.postid) / count(distinct l.postid)) >= 10 
        and (count(l.postid) / count(distinct l.postid)) < 35);

I'm getting crazy 94 rows. I don't know why. Will be thankful for possible help.

Julian Kolodzey
  • 359
  • 3
  • 9
  • 1
    One whole day of January is missing: what happens if f."timestamp" is `2018-01-31 13:14:15.12345` ? **Don't** use `between` for date ranges. – Paul Maxwell Dec 10 '18 at 01:31
  • Missed it... Thanks:) – Julian Kolodzey Dec 10 '18 at 08:52
  • A [mcve] includes DBMS (with version), a clear specification, cut & paste & runnable code (with small representative input) (format table initializations as tables) & diffable (hence ordered) desired output. But absolute basics of debugging say: Show that your program does what you expect as it goes through (sub)expressions by saying what that is & showing that it actually does it via incremental output. On adding problem code that you can't fix, research (the manual & the web). Repeat, minimizing working & wrong code. Then ask re the (small) difference between working & non-working examples. – philipxy Dec 10 '18 at 22:20
  • This seems likely a faq where people need the join of multiple tables where some are aggregations (maybe of joins) but they try to do the aggregation(s) at the end. – philipxy Dec 10 '18 at 22:22

2 Answers2

1

You don't need distinct in u.name because aggregate will remove the duplicate.

select
   u."name"
from 
   users u
   inner join friendships f on u.id = f.userid1
   inner join posts p on u.id = p.userid
   inner join likes l on p.id = l.postid
where 
   f."timestamp" >= '2018-01-01'::timestamp 
   and f."timestamp" < '2018-02-01'::timestamp
group by 
    u."name"
having 
    count(distinct f.userid1) > 3 
    and ((count(l.postid) / count(distinct l.postid)) >= 10 
            and (count(l.postid) / count(distinct l.postid)) < 35);

As in comment stated. Not good idea when you use between for date to do range.

f."timestamp" >= '2018-01-01'::timestamp 
and f."timestamp" < '2018-02-01'::timestamp

Will give you a full month of January.

dwir182
  • 1,539
  • 10
  • 20
  • 1
    I think you mean `and f."timestamp" < '2018-02-01'::timestamp` – Paul Maxwell Dec 10 '18 at 01:32
  • 1
    Almost, don't use `<=` just `<` To be absolutely accurate we want everything in January, but not one instant of February. – Paul Maxwell Dec 10 '18 at 01:49
  • Thank you. Your query returns 437 rows while the query: 'select u."name" from users u join friendships f on u.id = f.userid1 where f."timestamp" >= '2018-01-01'::timestamp and f."timestamp" < '2018-02-01'::timestamp group by u.id having count(f.userid1) > 3;' returns only 3 rows. – Julian Kolodzey Dec 10 '18 at 10:09
  • @JulianKolodzey i am sorry i am not notice.. Your query weird.. You `select u."name"` but do `group by u.id`.. `Group by` must column you are not doing aggregate in your `select` statement.. – dwir182 Dec 10 '18 at 10:17
  • @dwir182 Standard SQL allows select of columns that are deduced by certain rules to be functionally dependent on (single-valued per) group by columns. – philipxy Dec 10 '18 at 22:29
0

Try the below! The issue with using "count(f.userid1) > 3" is that if a user has , e.g. 2 friends and 6 posts and 3 likes they'll get 2 x 6 = 12 rows, so 12 records with non-null f.userid1. By counting distinct f.userid2 you can count distinct friends. Similar issues appear for the other counts used for filtering.

select  u."name"
from users u
join friendships f on u.id = f.userid1
join posts p on p.userid = u.id
left join likes l on p.id = l.postid
where f."timestamp" > '2018-01-01'::timestamp and f."timestamp" < '2018-02-01'::timestamp
group by u.id, u."name"
having
 --at least three distinct friends
 count( distinct f.userid2) > 3 
  --distinct likes / distinct posts
  --we use l.* to count distinct likes since there's no primary key
  and ((count(distinct l.*) / count(distinct p.id)) >= 10 
        and ((count(distinct l.*) / count(distinct p.id)) < 35);
George S
  • 2,041
  • 9
  • 13
  • 'at least three distinct friends' - I guess we need to find all the instances of the particular user appearing in the 'friendship' table to find out how much friends he has in the part of the time. And I think, it doesn't matter unique persons they are or duplicates itself. So I've used 'count(f.userid1)'... – Julian Kolodzey Dec 10 '18 at 09:16
  • 'count(f.userid1) > 3' will return true if the user has only two friends but three posts (since there will be rows for posts x friends), or four posts, which I do not think is what you want. Also, if you want to include posts with zero likes in the average, you would need to use a left join from posts to likes. I've updated the answer to take this in to account. – George S Dec 10 '18 at 18:25