0

I started to teach myself the basics of databases and i am currently working through 1. to 3. normal forms. What i understand until now is the wish to remove redundancy to make my databases less prone to inconsistency during phases of data-change as well as saving space by eliminating as much duplicates as possible.

For example if we have a table with the following columns:

  1. CD_ID
  2. title
  3. artist
  4. year

and change the design to have multiple tables where the first (CD) contains:

  1. CD_ID
  2. title
  3. artist_ID

the second (artist) contains:

  1. artist_ID
  2. artist
  3. year

I see that in the original table the year is transitively dependent on the ID via the artist. So we wanna get rid of that and create a table for the artists so our new CD table is now in third normal form.

But to do so i created another table (the artist table) which again is not in third normal form as far as I understand it, as we have the same type of transitive dependency like before just in another table.

Is this correct and if yes should i also normalize the artist table to be in 3rd NF? When do I stop?

Wolfone
  • 1,276
  • 3
  • 11
  • 31
  • 1
    Instead of linking a german site, why don't you just describe your specific problem? – Paul Spiegel Apr 09 '17 at 14:23
  • I thought it wouldn't matter because there is no text needed, just the tables because my question isn't about some special tables but a general question about design and i probably wouldn't be able to show what i wanted in plain text as easy as via an example. But i can try to explain better. Which point is unclear? – Wolfone Apr 09 '17 at 16:24
  • 1
    I am from germany :-) - My point is that using an example in german you are probably excluding users from answering your question. Even if the example is in english, it's better to post it in your question. – Paul Spiegel Apr 09 '17 at 16:45
  • 1
    Edited to be more selfcontaining and changed the initial question a little bit. I hope it is more understandable now. – Wolfone Apr 10 '17 at 12:07
  • 1. You have still not given the FDs that hold in this relation variable, that are the basis for decomposing it. 2. Normalization does not introduce new attributes (eg ids). It replaces a relation variable by others ('components'), each with a proper subset of its attributes, that always join to what its value would have been. 3. You do not "wanna get rid of that" (whatever you mean by that in detail); you want to follow some normalization algorithm. You really need to see a textbook/course/slides, your question doesn't show understanding of basics of normalization. – philipxy Apr 10 '17 at 13:11
  • Why do you think Artist "is not in third normal form as far as I understand it"? See my added PS. – philipxy Apr 13 '17 at 03:19

1 Answers1

2

TL;DR You need to follow a published algorithm to decompose to a given normal form.

PS You didn't get Artist from the original CD via normalization, since you introduced a new column. But assume table Artist has the obvious meaning. Why do you think it "again is not in third normal form as far as I understand it"? If artist -> year in the original CD then it also does in Artist. But then {artist} is, with {artist_id}, a CK (candidate key) of Artist, and Artist is in 3NF (and 5NF).


From your question's original version plus the current one, you have a proposed base table CD with columns cd_id, title, group & year, holding tuples where cd cd_id titled title was made by group group that formed in year year. Column cd_id is unique, hence is a CK. FD {group} -> year also holds.

Normalization does not introduce new column names. It replaces a proposed base table by others, each with a smaller subset of its columns, that always join to what its value would have been. Normalization up to BCNF is based on FDs (functional dependencies), which are also what determine the CKs of a base table. So your question does not contain a decomposition. A possible decomposition reminiscent of your question, which might or might not have any particular properties, would be to tables with column sets {cd_id, title, group} and {group, year}.

Other FDs hold in the original. Some hold because of what the columns are; some hold because of the CK; some hold because {group} -> year holds; in general, certain ones hold because all three of those do. And maybe others hold because of what tuples are supposed to go into the relation and what situations can arise. You need to decide for every possible FD whether it holds.

Of course, you might have been told that the only ones that hold are the ones that have to hold under those circumstances. But you won't have been told that the only FD that holds is {group} -> year, because there are trivial FDs and every superset of a CK functionally determines every set of columns.

One definition of 3NF is that a relation is in 2NF and no non-prime column is transitively functionally dependent on any CK. (Notice each condition involves other definitions.) If you want to use this to find out whether your relation is in 3NF then you next need to find out what all the CKs are. You can do this fastest via an appropriate algorithm, but you can just see which sets of columns functionally determine every column but don't contain a smaller such set, since those are the CKs. Then check the two conditions in the definition.

If you want to normalize to 3NF then you need to follow an algorithm for decomposing to 3NF. You don't explain what process you think you should follow. But if you aren't following a proven algorithm then whatever components you pick might or might not always join to the original and might or might not each be in any particular higher normal form. Note that examples of decompositions you have seen are not presentations of decomposition algorithms.


The NF (normal form) definitions give conditions that a relation must meet to be in that NF. They don't tell you how to nonloss decompose (preserving FDs when possible) to relations in higher NFs. People have worked out algorithms for producing decompositions to particular NFs. (And decomposing to a given NF doesn't in general involve first decomposition to lower NFs. Going through lower NFs can actually prevent good higher-NF decompositions of the original from being generated when you get to decomposing per a higher NF.)

You may also not realize that when some FDs hold, certain other ones must hold. The latter can be determined via Armstrong's axioms from the former. So just because you decomposed to get rid of a particular FD whose presence violates a particular NF doesn't mean there weren't a bunch of other ones that violated it that you didn't deal with. They can be present in the new components. Or they can be not present in problematic ways, so that you have not "preserved" them when you could have, leading to poor designs.

Learn about specific NF algorithms, and for that matter NFs and normalization itself, in a college/university textbook/course/presentation. Many are online.

philipxy
  • 14,867
  • 6
  • 39
  • 83
  • By saying "new components" you mean new relations/tables, am i right? I read about the conditions for first, second, and third NF and am not sure if my question was clear enough. From my understanding the new CD-table in my updated original post is in 3rd NF but the artist-table is not. So to get the CD-table into 3rd NF i created another table which is not. Now i wonder how to make the decision where i stop with decomposing and what triggers the decision to leave some tables in non or lower normalized forms. – Wolfone Apr 10 '17 at 12:10
  • Yes, by "new components" I mean the new tables that one uses instead of the original. The join of the new is always equal to the original. I don't understand why you think the original table is still around. When we say 'normalize a table', we mean normalize it out of existence; that we replace it by others. When you say "the new CD table", you're just arbitrarily choosing to call one of the new tables by the same name as the original. Re normalizing, as I said, you follow an algorithm (google '3nf algorithm' or 'bcnf algorithm') that people have proven works. Is this not clear in my answer? – philipxy Apr 10 '17 at 13:06