0

I have been trying the final days to come with a solution to the following question.

Lets suppose that we have the following two tables.

Film(ID',Title,Country,Production_Date)

Actor(ID',Name,Genre,Nationality)

Cast(Actor_ID',Film_ID',Role)

Given information:

Film holds N(film)=50.000 records, r(film)=40bytes, sequential organized, index on PK

Actor holds N(actor)=200.000 records r(actor)=80bytes,heap organized, index on PK

Cast holds N(cast)=100.000 records,r(cast)=25 bytes, heap organized, No INDEXES

The execution tree and relation expression for an execution plan is in the following picture:

enter image description here

For the lower level join between cast & film I'm calculating the followings:

Block Nested Loop Join : Bcast x Bfilm

Index Nested Loop Join : Bcast + Ncast x Cfilm

I'm keeping the smallest value which is given with an INLJ.

Question:

Now how can I calculate the size of the joined table and the new r which is the size of a record on the new joined table in order to proceed and calculate the upper level join between the already joined table with table actor after having calculated the cost B in blocks that join operation will take?

HelloIT
  • 172
  • 2
  • 22

1 Answers1

1

I assume you want to do a natural join on FILM.ID = CAST.FILM_ID and CAST.FILM_ID is a foreign key referencing FILM.ID.

1) Size of one row:

A join of Film and Cast results in tuples of the form

[FILM_ID, TITLE, COUNTRY, PRODUCTION_DATE, ACTOR_ID, ROLE]. 

Hence the row size should be something like

R(FILM JOIN CAST) = R(FILM) + R(CAST) - R(FILM_ID)

since the FILM_ID is the only column which is shared.

2) Number of rows:

N(FILM JOIN CAST) = N(CAST) 

As there is exactly one row in FILM for every row in CAST.

nCessity
  • 735
  • 7
  • 23
  • That's what I was thinking but how can I figure out the R(FILM_ID) in order to calculate the R(FILM JOIN CAST)? – HelloIT Feb 02 '17 at 15:39
  • In theory that depends only on the data type of the column CAST.FILM_ID. An integer with 4 byte... and so on. A VARBINARY or VARCHAR harder to predict. There might be some overhead, implementation-wise. So, that really depends on your RDBMS, your schema and your data. – nCessity Feb 03 '17 at 11:27