Decomposition Lossless Join Algorithm

Question

I have been provided with the following explanation for lossless joins. Can someone please explain what the variable 'r' is and how it can appear on both sides of the algorithm/equation/formula?

"If a relation R is decomposed into relations R1, R2 such that for every legal instance r of R...

r = πR1(r) ⋈ πR2 (r)

...then the decomposition itself is said to be a lossless-join decomposition."

Note: R1 and R2 are subscript.

What is the source of this quote? To give credit, and to better understand what they are trying to say. — philipxy, Nov 04 '17 at 20:20
@philipxy This was the explanation given by my Relational Database Management Systems lecturer on lossless decomposition. I do not know his source, but I honestly wouldn't be surprised if it was referenced from an unreliable source such as Wikipedia.. Does the formula actually make logical sense or is it written incorrectly? Regardless, is the concept just trying to illustrate that: If a relation R is decomposed into R1 and R2 _(regardless of what they may represent)_, then then at any point in time _(instance)_ R1 ⋈ R2 should always reproduce the original relation R? — Coffeebeean, Nov 05 '17 at 06:10
Yes, a decomposition is lossless if and only if the component projections always join back to the original, ie if and only if in every state R = R1 ⋈ R2. Get clear in your mind what each use of a name denotes. In the first mention in the quote and your comment, the Rs are relation *variables*. In calls to π, the Rs means the set of attributes of the corresponding variable. In your equation the Rs denote *the values of the corresponding variables*. The quote clearly says that r is a "legal instance of" R & talks about it--r is not a variable, it is the value of variable R "at a point of time". — philipxy, Nov 05 '17 at 09:07

AntC · Answer 1 · 2017-11-04T13:48:37.277

0

r is supposed to stand for any relation value/instance of schema R (that is losslessly decomposable). But even if a relation value is losslessly decomposable, we must pay attention to which (sets of) attributes we decompose into.

The more important question: what are R1, R2?

Your quote is perhaps based on the wikipedia article or one of the links from it -- which appears to be almost complete garbage; far worse than wikipedia's usual standard on Relational Model topics. (For example it shows a cartesian product to recompose the projections -- whereas a lossless join usually requires a join: the bowtie operator, as you show.)

Operator π is the Relational Algebra projection. π is usually shown subscripted with an attribute name (or more properly a set of attribute names). And those are usually symbolised by X, Y or similar. Then R1, R2 should be sets of attribute names, not relations. They certainly can't be both. (All I can imagine is that R1, R2 are intended to stand for the schemas/sets of attributes of the two relations.)

Furthermore for lossless join decomposition, we require the two projections together to include all the attributes of R. (And typically some attributes in common so that the join matches together.)

So we should have

r1 = πX(r)             -- r1 is the value of R1 corresponding to r
r2 = πY(r)             -- ditto for R2
attributes of R = X ∪ Y   -- intersection of X, Y not necessarily empty
r = r1 ⋈ r2

A clear example is where R has attributes {A, B, C} and {A} is a key. Then we can decompose as X = {A, B}, Y = {A, C}.

edited Nov 04 '17 at 13:48

answered Nov 04 '17 at 12:56

AntC

2,623
1
13
20

Not only is that wikipedia page is unclear, it wrongly states that "for the decomposition to be lossless" certain things must be so; but what is correct is that those conditions must be so for a given lossless decomposition to preserve a given FD. – philipxy Nov 04 '17 at 20:18
R, R1 & R2 "certainly can" be and are used to name a variable and/or schema and to name its attribute set. (And in the wiki page, its value.) Ie they are ["abused"](https://en.wikipedia.org/wiki/Abuse_of_notation#Abuse_of_language). This is mentioned explicitly in well-written presentations. Also it would be clearer to say that we "require the two projections together to" not just "include" but *be* "all the attributes". Except that it is not the case that all or part of R = R1 U R2 "should" be explicitly required since it is implied by the equality of relations. – philipxy Nov 04 '17 at 21:22
"abused" notation is not going to help explain what's going on. That's not a Typo: I wish to point out `intersection of X, Y not necessarily empty` (contrary to what one might try to guess from wikipedia's using cartesian product). `union of X, Y` must _be_ "all the attributes" of `R` is what that line is requiring. (To say the union is possibly empty would be true but unhelpful, because then `R` must be of degree zero.) – AntC Nov 05 '17 at 02:27
You clearly don't know what's going on, which is that notation is being abused, so I explained it to you. I also said that the abuse should be explained in the quote but the quote is not well-written. – philipxy Nov 05 '17 at 02:38

Decomposition Lossless Join Algorithm

1 Answers1