5

I am reading about Codd’s Eight Original Operators in Inside Microsoft SQL Server 2008: T-SQL Querying by Itzik Ben-Gan, Lubor Kollar, Dejan Sarka, and Steve Kass and do not understand the Divide operator.

Quotes defining the Divide operator:

“A divisor relation is used to partition a dividend relation and produce a quotient relation. The quotient relation is made up of those values of one column from the dividend table for which the second column contains all of the values in the divisor.”

This statement is in agreement with Wikipedia's definition and example.

“The formula for the Divide operator includes three relations: a divide by b per c, where a is the dividend, b is the divisor, and c is the mediator relation. Let relation a have attributes A and relation b attributes B. The Divide operator returns a relation that includes of all tuples from divisor such that a tuple {A, B} appears in the mediator relation for all tuples from divisor relation.”

The diagram below is used to demonstrate this statement. I believe the relations are presented in the following order: dividend, divisor, mediator, and end result.

enter image description here

The second relation (the divisor) has {a, x}, {a, z}, {b, x} and {b, z} for tuples. My thought process follows that since there are tuples {b, x} and {b, z} that b should be included in the end result. I've check the book corrections on the book's website (linked at the beginning of this post) and am certain that I am wrong.

Why is the result of the diagram example a and not a and b?

philipxy
  • 14,867
  • 6
  • 39
  • 83
Jesslyn
  • 696
  • 10
  • 25

2 Answers2

4

Relational Division has always been a mess, and is likely to stay that way. It was originally invented as the means for relational query systems to be able to formulate/answer questions such as, e.g., "what is the list of customers who have subscribed ALL possible insurance policy types". That is, it was intended as the vehicle for formulating queries that involved some kind of universal quantification as a predicate determining the result set.

Elaborating further on my customers/policies example, let's assume that the set of "ALL possible insurance policy types" is itself time-varying, i.e. over time, new policy types can emerge, while others can be discontinued. Let's further assume that "ALL possible insurance policy types" in a certain query, means, specifically, "ALL policy types that are currently open for subscrition by customers" (that is, the discontinued policy types are not part of this set of "ALL" types).

Let's assume that this set of "ALL possible policy types" at a certain moment is {TYPE1, TYPE3}. TYPE2 has been discontinued. Let's also assume that customer ES still has a policy of type TYPE2, obviously dating from before when it was discontinued. Thus customer ES has policies of type {TYPE1, TYPE2, TYPE3}.

Now answer the question whether this customer has "ALL policy types that are currently open for subscription". Your answer should be a firm 'yes'. You might get a sense of where this is going : relational division centers around comparison of two sets. One is a "comparand" (the customer's set of subscribed policy types), the other one is a "reference" (the set of policy types that are currently open for subscription).

Now there are at least two useful comparisons that can be made between sets : one is for equality, the other is for set containment (subsetness). In some cases of querying you will want the equality test, while in others you will want the containment test. And the relational operator called 'division' (in whatever of its umpty flavours) does not allow to make this distinction. I gather it is this phenomenon that you are asking about, and the answer is simply that the choice is made by design, so to speak, and hardwired into the operator's definition. It makes the operator "useful" in those cases where its definition matches your needs, and useless in other cases.

The sort-of good news is, when you have to spell out the SQL for a relational division, there won't be much difference between division-with-equality and division-with-containment (despite the fact that the algebra operator is, by definition, only one of the two, and the other one simply even does not have an algebraic operator). Main problem is that set equality itself is very messy to express in SQL, and that's not any less so "inside a relational division query" ...

And then there are still all the valid points already made by philip. Read them, but very carefully.

Erwin Smout
  • 18,113
  • 4
  • 33
  • 52
2

The result in the diagram should indeed be rows {a} and {b}. But the book and your message are mixed up.

There are a number of relational DIVIDE variants. There is Codd's divide, Date's abridged version of Codd's divide, Todd's divide, Date and Darwen's Small (Codd-like) divide-per and Great (Todd-like) divide-per and their Generalized Small and Great divide-pers, and Darwen's per-divide-per.

The quotes and diagram are from the book. First they are trying to talk about Codd's divide and it has two inputs; "a divisor relation is used to partition a dividend relation". (The book even says: "This is the way the Divide operator was defined originally.") But its second sentence is wrong. First, it talks about columns instead of disjoint column sets. Second, it limits its second operand to only having a subset of the first operand's columns. (It's Date's abridged version of Codd's divide.) Third, it is unintelligible. Let A, B and C be disjoint attribute sets and AB and BC be relations with attribute sets A U B and B U C. Then Codd's AB DIVIDEBY BC returns A subrows that appear in AB with every B subrow that appears in BC.

The wiki actually describes Codd's divide. So only if you ignore the problems above is the first quote "in agreement with" it.

Then the second quote is talking about a different "extended" divide operator. The book explains that Codd's divide doesn't reflect a natural phrasing using "for all" in common situations. They then unclearly introduce the other operator. They are trying to define the Small divide-per. The input "includes three relations: a divide by b per c". The diagram shown is even labelled as an "extended divide".

Unfortunately the definition is not only ungrammatical, it is unintelligible. A DIVIDEBY B PER AB returns rows that appear in A that appear as subrows in AB with every row that appears in B. They should say that it returns the tuples {A} in the dividend such that a tuple {A, B} appears in the mediator with all tuples {B} in the divisor

That book is a mess. The writing is terrible. The proof reading is terrible. Given the text in this section there is no reason to expect the answers to be right. But also, you are not reading carefully.

Then you misinterpret the arguments in the diagram. But you don't notice because you are not following the definition. Thus not noticing that the definition is unintelligible. The diagram shows relations A-AB-B-result which is dividend-mediator-divisor-quotient. So again you are not reading carefully.

And it is true that both {a} and {b} rows should be in the result. But I can't really say you're "right" about that because your "thought process" seems to be "get a vague impression". Some of the vagueness is surely due to the poor writing in the book. But you don't seem to be noticing that the book is unintelligible. Read carefully!

philipxy
  • 14,867
  • 6
  • 39
  • 83
  • 3
    When you describe the book with terms like "unclearly", "ungrammatical", "unintelligible", "a mess", and "terrible", I think it would be unreasonable for you to say that I am not reading carefully. I do agree that I have been getting mixed up a lot while trying to read this book and thus my understanding thus far is not guaranteed to be correct. – Jesslyn Jul 16 '14 at 12:22
  • 2
    If the book is really unintelligible then a better advice would be to just ditch it, rather than "read carefully" !!! – Erwin Smout Jul 16 '14 at 13:52
  • @Jesslyn I tried to write the most helpful answer I could. I quoted clear statements, most in the question itself, that contradict the question. (But maybe I was not clear enough.) – philipxy Jul 16 '14 at 21:23
  • @ErwinSmout Maybe ditching the book is good advice. I considered it. But it's a case of "as well as" not "rather than". The message quotes & references then contradicts clear things. – philipxy Jul 16 '14 at 22:20