9

I am studying for exams and am failing to find a solid criteria by which I can determine if the Cartesian Product x is to be used or if Natural Join |X| is to be used.

I had come up with a rough guide that:

"If you need to project an attribute that has the same name as an attribute in the table to be joined you must use x and state the table names to be projected: tableA.colname1 = tableB.colname1"

This however doesn't follow some of the solutions in my notes and my lecturer seems to use x with the above convention or |x| interchangeably.

Does anyone have a rule that can be followed to define use of one over the other?


Take for example this schema (only schema related to the question quoted for brevity):

takes(ID, course_id, sec_id, semester, year, grade)
student(ID, name, dept_name, tot_cred)

Q) Find the name of all students who took courses that were taught in either Spring 2011 or Autumn 2011.

My answer attempt:

π name(σ semester="Spring" ^ year=2011(takes ⋈ student)) ∪ π name(σ semester="Autumn" ^ year=2011(takes ⋈ student))

Actual answer:

π name(σ semester="Spring" ^ year=2011 ^ takes.ID=student.ID(takes x student)) ∪ π name(σ semester="Autumn" ^ year=2011 ^ takes.ID=student.ID(takes x student))

Can anyone provide a reason as to why?

In my mind the Natural Join would take care of the takes.ID=student.ID?

Myles Gray
  • 8,711
  • 7
  • 48
  • 70

3 Answers3

10

A natural join, as I understand it, is a projected, filtered Cartesian product:

  • You take the Cartesian product, then
  • select it, so that the values in columns of the same name have the same value, and
  • project it, so that all columns have distinct names.

Under this assumption, your answer is isomorphic to the actual answer.

To see this, you might want to expand the natural join to the above sequence of operators, and float them around using the laws of relational algebra. You'll see that the projection disappears due to the projection to name, and the selection criterion is fused with the selection above. You'll end up with exactly the same tree as the actual answer, even though you never changed the meaning of your own answer!


I can think of one reason why your lecturer uses these concepts interchangeably: your lecturer wants you to understand that these concepts can be used interchangeably, because "the natural join is just a shortcut" (though that's debatable).

  • 1
    So you mean like so: `r ⋈ s == π r.A, r.B, r.C, r.D, s.E (σ r.B = s.B ^ r.D = s.D (r x s))` for say schema: R(A,B,C,D) and S(B,D,E)? – Myles Gray Jan 02 '13 at 19:05
  • Yes! :) At least, that's how I understand it. –  Jan 02 '13 at 19:06
  • Okay that makes sense to me, just worried as the "answers" only show 1 solution to the problem, when in fact there ARE 2 and I would be marked incorrect. – Myles Gray Jan 02 '13 at 19:07
  • There are too many possible answers, and only the smallest would be called correct. Your solution is smaller than the actual solution, so if you were allowed to use ⋈, your answer is "more correct". There are even shorter solutions possible, though. Float the projections over the set union, and you'll save yourself one operator. –  Jan 02 '13 at 19:10
  • I was thinking this: `π name(σ (semester="Spring" ^ year=2011) ∨ (semester="Autumn" ^ year=2011)(takes ⋈ student))` but i'm not sure if that's allowed... – Myles Gray Jan 02 '13 at 19:14
  • 1
    Good point, but I don't know that either. That really depends on what operators you're allowed to use. If you can't use logical disjunction, you simply *have to* use set union. But if you can, then that's almost the shortest possible answer. –  Jan 02 '13 at 19:17
  • What IS the shortest answer by way of interest? – Myles Gray Jan 02 '13 at 19:25
  • `π name(σ (semester="Spring" ∨ semester="Autumn") ^ year=2011 (takes ⋈ student))`, I think. Unless you're also allowed to test for set membership, then it would be `π name(σ (semester in {"Spring", "Autumn"}) ^ year=2011 (takes ⋈ student))`. –  Jan 02 '13 at 19:27
  • Membership tests are not allowed, however logical disjunction has not been excluded. Thanks for your help @Tinctorius – Myles Gray Jan 02 '13 at 19:31
3

Cartesian product is just a special case of natural join where the joined relations don't have any attribute names in common. In Codd's original algebra renaming was a separate operation altogether. To obtain a true cartesian product of two relations that have some attributes in common you would have to rename those attributes before doing the (natural) join.

In the interests of brevity the renamings are sometimes left out of written examples and the product symbol used instead. Unfortunately this obscures the important point, that there is only one kind of join.

nvogel
  • 24,981
  • 1
  • 44
  • 82
-1

I think there are two extreme situation:

  1. there are no duplicate rows in inner join: inner join is equal to intersect (i mean only the result). distinct inner join ~ intersect

  2. there are no common feature in the inner join: inner join is equal to Cartesian product.

spearous
  • 47
  • 1
  • 5