Relational Algebra equivalent of SQL "NOT IN"

Question

Is there a relational algebra equivalent of the SQL expression NOT IN?

For example if I have the relation:

A1  |  A2
----------
x   |  y
a   |  b
y   |  x

I want to remove all tuples in the relation for which A1 is in A2. In SQL I might query:

SELECT
    *
FROM
    R
WHERE
    R.A1 NOT IN
        (
        SELECT
            A2
        FROM
            R
        )
/

What is really stumping me is how to subquery inside the relational algebra selection operator, is this possible?:

σ_{some subquery here}R

Andomar · Accepted Answer · 2012-09-22T11:37:56.823

6

In relational algebra, you can do this using a carthesian product. Something like:

R - ρ_a1,a2(π_a11,a21(σ_{A11 = A22}(ρ_a11,a21(R) x ρ_{a12, a22}(R))))

rename the columns of R, f.e. from a1 to a11 (left hand) and a12 (right hand)
take the cross product of the R's with renamed columns
select rows where a11 equals a22
project out a12 and a22 and keep a11 and a21
rename to a1 and a2

That gives you the rows that were matched. Subtract this from R to find the rows that where not matched.

edited Sep 22 '12 at 11:37

answered Sep 22 '12 at 07:33

Andomar

232,371
49
380
404

1

can you please explain how this works... and maybe expand the ellipsis dots. I am having trouble understanding the result of the cross product, there are only two fields in R so how can you put the pi operator on it with more than two arguments? – jsj Sep 22 '12 at 10:59
If there are only two columns you can omit the ellipsis dots. The answer also used PI where it should have used RHO, not sure if that was in the edit or the original answer. – Andomar Sep 22 '12 at 11:33

AntC · Answer 2 · 2013-10-02T23:34:12.243

The opening question is sending us down the wrong thinking. It should be:

Is there a relational algebra equivalent of the SQL expression R WHERE ... [NOT] IN S?

(That is, the answer is some operation between two relations, not some sort of filter.)

The answer is Yes, it is (Natural) JOIN aka the bowtie operator ⋈.

To see why, let's first tidy up the SQL solution given. As shown, it's looking for attribute A1 NOT IN a relation with single attribute A2. That's really a mis-match in attribute names. SQL also allows NOT inside the where condition. This SQL makes the logical structure clearer:

SELECT * FROM R
WHERE NOT (A1 IN (SELECT A2 AS A1 FROM R) )

Now we can see a projection and a rename. (The surrounding NOT we can implement as set MINUS, as per the first answer.) So the equivalent RA is:

R - (R ⋈ ρ_A1⁄A2(π_A2(R)))

For interest, the Tutorial D is:

R MINUS (R JOIN (R {A2} RENAME A2 AS A1))

In the way the question is put, there's a hangover from SQL thinking. SQL's WHERE forces you into row-level 'mode'. This is contra Codd's rule 7 requiring set-at-a-time operators.

In general, SQL's WHERE and RA's σ with their row-level filters can be more succinctly implemented as (Natural) JOIN with set-at-a-time logic. (For example, this is what Date & Darwen do in their A algebra.)

score 1 · Answer 3 · edited Nov 24 '19 at 20:44

1

A direct answer to a more general question:

SELECT
    *
FROM
    R
WHERE
    R.A1 NOT IN
        (
        SELECT
            A2
        FROM
            S
        );

The answer is:

R-R "bowtie" [R.A1=S.A2] ("pi" [A2] S )

edited Nov 24 '19 at 20:44

Das_Geek

2,775
7
20
26

answered Nov 24 '19 at 19:40

weikai jia

11
1

Relational Algebra equivalent of SQL "NOT IN"

3 Answers3

Linked