Understanding Division in Relational Algebra

Question

I have a bit of trouble understanding this slide regarding division in Relational Algebra. I did some research and was referred to by many people to On Making Relational Algebra Comprehensible by Lester I McCann. I'm having trouble on understanding one of the slides (Slide 13). I recreate the slides below essentially.

Query: Find the sno value of the suppliers that supply all parts of weight equal to 17.

Relation P

+-------------------------------+
| pno pname color weight city   |
+-------------------------------+
| P1  Nut   Red   12.0   London |
| . . . . . . . . . . . . . . . |
| P6  Cog   Red   19.0   London |
+-------------------------------+

Relation SPJ

+-------------------------+
| sno pno jno qty         |
+-------------------------+
| S1  P1  J1  200         |
| . . . . . . . . . . . . |
| S5  P6  J4  500         |
+-------------------------+

I understand that I need the following schema. Relation A projects a list of sno, pno. Relation B tells you which pno equals to 17 weight.

α (sno, pno)
β (pno) 
α ← π sno,pno (SPJ)
β ← π pno (σ weight=17 (P))

Result:

Relation α

+---------+
| sno pno |
+---------+
| S1 P1   |
| S2 P3   |
| S2 P5   |
| S3 P3   |
| S3 P4   |
| S4 P6   |
| S5 P1   |
| S5 P2   |
| S5 P3   |
| S5 P4   |
| S5 P5   |
| S5 P6   |
+---------+

Relation β:

+-----+
| pno |
+-----+
| p2  |
| p3  |
+-----+

However the slide then goes on to say:

Find the values that do not belong in the answer, and remove them from the list of possible answers.

In our P–SPJ example, the list of possible answers is just the available sno values in α:
+-----+
| sno |
+-----+
| S1  |
| S2  |
| S3  |
| S4  |
| S5  |
+-----+

This is where I'm stuck. He says "P - SPJ" in the example but if I do that I don't get the relation above. I don't think it's possible to even do P - SPJ? According to A First Course in Database Systems, when we apply difference operation to relations, the two tables need to have schemas with identical sets of attributes (which P and SPJ do not have)?

If someone could just point me in the right direction that would be great thanks! I have the book A First Course in Database Systems, Chapter 4 which teaches Relational Algebra but unfortunately does not teach division (which I stumbled upon and wanted to learn).

score 3 · Accepted Answer · answered Dec 18 '18 at 01:41

3

Find the values that do not belong in the answer, and remove them from the list of possible answers.

When they say "Find the values that do not belong in the answer", that is something that they do later. That relation of "values that do not belong" will be π sno (δ).

When they say "and remove them from the list of possible answers" they mean that the answer is a relational difference that they finally do later between a "list of possible answers" relation that they find next & π sno (δ) that they find after that.

In our P–SPJ example, the list of possible answers is just the available sno values in α:

When they say "In our P-SPJ example, ..." they just mean "In our example involving relations P & SPJ, ...". They are using a dash; they are not using a minus sign for relational difference. What they next calculate & show is the "list of possible answers" relation π sno (α).

(Finally later on they get the answer, which is π sno (α) - π sno (δ).)

answered Dec 18 '18 at 01:41

philipxy

14,867
6
39
83

Ah okay I see. Wow okay I was really over thinking it then. When they do π sno (δ). The result they had listed (s1, s2, s3, s4, s5) is because in relational algebra duplicates are always eliminated? just wanted to confirm thank you! – Maggie Liu Dec 19 '18 at 22:54
A relation holds a set of rows; it has no duplication of rows; it has no "duplicate rows". Relation operators return what they are defined to return. They return relations; those have no duplication of rows; they have no "duplicate rows". You might give a *description* of what an operator returns *in terms of non-relations* that can in some sense have "duplicate rows" where that description *happens* to generate "duplicate rows" that it then "eliminates". But it is not meaningful to talk re algebra operators "eliminating duplicates" because it is *your particular description* that does that. – philipxy Dec 19 '18 at 23:11
Right. so I went back to the textbook. In response to your point "a relation is a set of rows". I found in the textbook: "A relation is a set of tuples, not lists thus the order in which the tuples are represented is immaterial." (Just trying to look up what you're saying so I can have a better understanding" You said "they return relations; those have no duplication of rows" - Does it make sense to say, if there is a "duplication of rows" then it is not a relation? – Maggie Liu Dec 19 '18 at 23:26
I'm just trying to figure out how they got (S1, S2, S3, S4, S5) using πsno(α). and if you use the project operator according to the stanford relational algebra (video 2) under duplicates. "semantics of relational algebra says that duplicates are always eliminated". – Maggie Liu Dec 19 '18 at 23:28
PS So nothing in the slides involves "duplicate rows" & the only ones would be in your head if you personally happen to think of some operator(s) as returning the same result *as if* they involved some kind of non-relation thing containing duplicates that are eliminated on the way to finding the result. PS A definition free of "duplicate rows": `π s (r)` is the relation holding the set of rows with attribute set written "{ `s` }" that are subrows of `r`. – philipxy Dec 19 '18 at 23:30
You quoted "semantics of relational algebra says that duplicates are always eliminated" but that is talking about *some particular description(s) they used* for some operator(s). Think: When they say "duplicates are always eliminated" what do they *mean*? They mean eliminated *if* you were getting an operator result by using their particular result *description* that involves a *list* that can have duplicates. The *result relation* & any other relation they might use in a description *does not have duplicate rows aka tuples*. PS I you have more please google & if necessary post anew. – philipxy Dec 19 '18 at 23:41
PS Whenever you are reading about a thing that has no duplicate tuples, it cannot be a relation because relations have no duplicate tuples. But why do you think you need to be told this? Of course you should understand what a relation is. But if you are reading about a thing--whether it has duplicates or not--*why don't you already know what kind of thing it is*?? – philipxy Dec 19 '18 at 23:49
Hi thanks for your responses! So you say "they mean eliminated if you were getting an operator result by using their particular result description that involves a list that can have duplicates". So my reasoning was correct regarding πsno(α)? After using the project operator, duplicates will be eliminated. – Maggie Liu Dec 20 '18 at 01:16
I appreciate your help and I was just trying to show that I was listening (in response to your PS about "why don't I already know what kind of a thing it is")". Next time I won't respond to something that I can "please google"! Thanks for your help again. – Maggie Liu Dec 20 '18 at 01:22
Please communicate a complete idea when you write. You say "After using the project operator, duplicates will be eliminated."--*in some process that you are thinking of, which you didn't give, which is unknown to me, which I can't confirm anything about*. I said to think--*in what way are "duplicates eliminated" & why even mention it when we know **π returns a relation**, so cannot have duplicates*? Try to stop thinking in terms of the structure of *definitions* & think about the structure of the *defined things*. After using π you have a relation. And relations don't have duplicates. – philipxy Dec 20 '18 at 01:43
Re "why don't I already know what kind of a thing it is": I didn't mean that you should research unfamiliar words to find out what they mean. I assume that you would do that. I meant that when you read about *some property of a thing* you should already know from what you just read (& researched) *what kind of thing the thing is*. If you don't then you weren't paying attention when the thing was introduced & you aren't paying attention when the property is ascribed. – philipxy Dec 20 '18 at 02:03
Your text 3rd edition introduces π in 2.4.5 Projection but *it does not clearly define it*. It would be clear if it said to make a list of the tuples in R then a list of their A subtuples then "eliminate duplicate tuples" then make a relation with those tuples. But it doesn't. It would be clear if it said (like me) that π returns the set of A subtuples in R. But it doesn't. It only says "π A1,A2,...,An (R) is a relation that has only the columns for attributes A1, A2, ... , An of R." Not clear. Bad writing & proofreading. Ideally a reader should notice--'what does "the columns for" mean?'. – philipxy Dec 20 '18 at 03:08

Understanding Division in Relational Algebra

1 Answers1