-1

I have a question on how the less than operator is used in this SQL query that answers the following English statement:

Drinkers who frequent at least two bars

NOTE: there is a table called frequents with columns of drinker and bar.

Now, I solved this originally by utilizing the group by and having count operators, but I have stumbled upon this query which also is correct:

SELECT DISTINCT f1.drinker
FROM frequents f1, frequents f2
WHERE f1.drinker = f2.drinker
AND f1.bar < f2.bar

I am having a little trouble understanding the mechanics of this query, specifically with the final AND statement.

Chad
  • 185
  • 1
  • 11
  • I believe that < sign is supposed to be <> which should give somewhat correct results since it still queries not equal. – Engin Sep 23 '15 at 00:03
  • 2
    @Engin the point of using the `<` instead of `<>` is preventing from getting the pairs in both orders. (a, b) and (b, a). Since there is a DISTINCT, it is slightly moot for final results, but could be improving performance. – Uueerdo Sep 23 '15 at 00:06
  • It's not moot. Without it *every* drinker would be returned. – aquinas Sep 23 '15 at 00:10
  • @Uueerdo hah! never thought of that, maybe because of the distinct. Thank you. – Engin Sep 23 '15 at 00:14
  • @aquinas the choice of `<` vs `<>` is moot, not the use of a general not equal operator. – Uueerdo Sep 23 '15 at 00:45

2 Answers2

0

The table is joined with itself on the same drinker. Now in order for f1.bar < f2.bar implies f1.bar != f2.bar. Thus 2 bars (maybe more).

Jimmy Hoffa
  • 29
  • 1
  • 4
0

It would have made a bit more sense if the final and said AND f1.bar <> f2.bar

So, the idea here is that there is a cartesian join happening: every row of table 1 is joined against every row of table 2. So you can imagine your result set consists of the number of rows in frequents times itself. e.g., if you have 10 rows in frequents you'll have 100 rows in your result set: each row joined with every other row. OK. So from that result set, you are now saying, well, I only care about the rows where the frequent id is the same but the bar they frequent is different. Obviously, the f1.bar <> f2.bar is necessary because otherwise every row would match with itself. This guarantees that the frequenter has at LEAST two rows that show up. The distinct takes care of removing the additional duplicates. Note: count(*) and group by is a much better approach.

aquinas
  • 23,318
  • 5
  • 58
  • 81