2

I have a query of the form:

SELECT DISTINCT person_id
FROM my_table
WHERE person_id NOT IN (SELECT person_id FROM my_table WHERE status = 'hungry')

In my_table there are multiple rows for each person, and I want to exclude those people who have ever had status "hungry". This is a construct I regard as standard and have used in other SQL dialects, but this brings me back an empty result set in Athena.

On the other hand, the plain old IN construction works as expected.

Can anyone explain how I can write this query in Presto? I found another article on SO that seems to imply it works correctly, so I am a bit nonplussed.

Tom Wagstaff
  • 1,443
  • 2
  • 13
  • 15

4 Answers4

4

Do not use NOT IN. If any returned values are NULL then it returns no rows. Note: This is how SQL works, not a peculiarity of any particular database.

Instead, use NOT EXISTS:

SELECT DISTINCT t.person_id
FROM my_table t
WHERE NOT EXISTS (SELECT  
                  FROM my_table t2
                  WHERE t2.status = 'hungry' AND
                        t2.person_id = t.person_id
                 );

Actually, I might suggest aggregation for this instead -- you are already doing aggregation essentially with the SELECT DISTINCT:

select person_id
from my_table t
group by person_id
having sum(case when status = 'hungry' then 1 else 0 end) = 0;
Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • Ha, I'd assumed it wouldn't be the case but it does indeed turn out we have NULL IDs in that table! The EXISTS construct works - thank you – Tom Wagstaff Mar 13 '20 at 18:04
1

I would do aggregation :

SELECT person_id
FROM my_table
GROUP BY person_id
HAVING SUM(CASE WHEN status = 'hungry' THEN 1 ELSE 0 END) = 0;

If you want full row then use NOT EXISTS , NOT IN would return no row if sub-query have null :

SELECT DISTINCT t.person_id
FROM my_table t
WHERE NOT EXISTS (SELECT 1  
                  FROM my_table t1
                  WHERE t1.status = 'hungry' AND
                        t1.person_id = t.person_id
                 );    
Yogesh Sharma
  • 49,870
  • 5
  • 26
  • 52
1

Using conditional aggregation:

SELECT person_id
FROM my_table m
GROUP BY person_id
HAVING COUNT(CASE WHEN status='hungry' THEN 1 END)=0
Lukasz Szozda
  • 162,964
  • 23
  • 234
  • 275
0

I feel compelled to point out that you can solve this by just excluding the NULLs explicitly from the subquery, and sticking with the NOT IN construct:

SELECT DISTINCT person_id
FROM my_table
WHERE person_id NOT IN (SELECT person_id FROM my_table WHERE status = 'hungry' AND person_id IS NOT NULL)
Tom Wagstaff
  • 1,443
  • 2
  • 13
  • 15