15

I am to create a logical data model based on my own project specification and also determine the functional dependencies.

Table User example data:

user_id username    regDate          type subscription
      1    JohnS 01-01-2012 Administrator         NULL
      2   PeterB 02-01-2012     Moderator       Movies
      3   PeterA 02-01-2012          User       Movies
      4     Gary 03-01-2012          User        Books
      5    Irene 03-01-2012          User       Movies
      6     Stan 03-01-2012          User       Movies
      7    Isaac 04-01-2012          User        Books

Primary key: user_id
Unique key: username
Foreign key: subscription

How do I determine the functional dependencies?

I get two functional dependencies:
user_id -> username, regDate, type, subscription
username -> user_id, regDate, type, subscription

philipxy
  • 14,867
  • 6
  • 39
  • 83
Shiraz
  • 165
  • 1
  • 2
  • 6

4 Answers4

7

If "username" is both unique and required (unique and not null), then it's a candidate key. In relational modeling, there's no theoretical difference between one candidate key and another. More specifically, in relational modeling, there's no theoretical reason to pick one candidate key and label it "primary key". A key is a key.

So you're right. There are two functional dependencies here. (Or 8, if you decompose the right-hand-side into individual columns. user_id -> username, user_id -> regDate, etc.)

Mike Sherrill 'Cat Recall'
  • 91,602
  • 17
  • 122
  • 185
  • But decomposing into 8 columns is not a good practice in designing an efficient database. – Sahil Babbar Apr 01 '17 at 17:50
  • 3
    @SahilBabbar: Decomposing compound functional dependencies into individual functional dependencies isn't the same thing as decomposing a relation into individual attributes. – Mike Sherrill 'Cat Recall' Apr 01 '17 at 19:12
  • There are many more FDs that hold, the ones implied by the mentioned ones. Also presumably the remaining FDs using this schema's attributes don't hold. Considering the title is "How to determine the functional dependencies", this doesn't really answer the question or debunk misconceptions. – philipxy Feb 12 '20 at 23:01
6

Functional dependencies are defined from a theoretical perspective as follows (Wikipedia):

Given a relation R, a set of attributes X in R is said to functionally determine another set of attributes Y, also in R, (written X → Y) if, and only if, each X value is associated with precisely one Y value; R is then said to satisfy the functional dependency X → Y.

From a technical perspective, you are trying to find attributes that uniquely identify other attributes. As a shortcut, determine your candidate keys and the attributes that depend on them. Your examples are correct because a username, regDate, type, and subscription all depend on the value of user_id. If username is unique and not null, it is a candidate key and also identifies the set of attributes.

Jordan Parmer
  • 36,042
  • 30
  • 97
  • 119
  • 4
    I am sorry for the down-vote, but when you search for functional dependencies, it is NOT enough to determine the candidate keys. There are cases when transitive FDs are present, thus the determinant column set is not a superkey, therefore your answer is incorrect. – Lajos Arpad Oct 26 '13 at 04:48
4

I will assume you are using MySQL, but if not, you can implement your idea in any other RDBMS.

Run the following command to get all your tables:

show tables;

Then iterate all the tables and run the following command for each of them:

show columns;

FDs can be described as follows:

Determinant -> Dependent,
Determinant = {A1, ..., Am},
Dependent = {B1, ..., Bn}

where Ai and Bj are columns. You need to generate all possible scenarios for Determinant and Dependent. For each scenario you will need to view whether exists at least two separate records where the determinant columns match and at least one of the dependent columns do not match. If so, then the scenario is not an FD, otherwise it is an FD. Example: Let's assume, that m = 3 and n = 2:

select count(*)
from mytable t1,
mytable t2
where ((t1.A1 = t2.A1) and 
       (t1.A2 = t2.A2) and 
       (t1.A3 = t2.A3)
      ) and 
      (
       (t1.B1 <> t2.B1) or
       (t1.B2 <> t2.B2)
      )
;

will return the number of records which break the FD-rule. If the value is 0, then the scenario is an FD.

Of course, in your particular case you can omit a few steps, and you have your columns instead of Ai and Bj, but you hopefully understand the idea.

Lajos Arpad
  • 64,414
  • 37
  • 100
  • 175
  • 3
    I want to add to this, it may seem obvious to some but it still deserves specific mention. The absence of disproof doesn't prove the existence of an FD, therefore if you don't find a counterexample to a given FD it's still not guaranteed that that FD is valid. This of course only applies if you are operating on a subset of the tuples which will be stored in the database, as FD's can be disproved by new data. – Mitchell Carroll Dec 04 '15 at 22:28
  • @MitchellCarroll, if you observe an FD, then it is an FD. From the mere fact that you find an FD, you cannot know for sure whether that FD is applicable only for now or it is guaranteed to be fulfilled in the future as well. In one of my studies, conducted about CFDs (Conditional Functional Dependency), that are FDs applicable if and only if a condition is fulfilled I have separated dependencies (D) to Apparent Dependencies (AD) and Real Dependencies (RD). – Lajos Arpad Dec 06 '15 at 06:36
  • ADs are dependencies, which happen to be true at a given moment, while RDs are dependencies, which will be guaranteed to be true in any given moment. Basically, if you know about a D that it is an RD, then you can prevent insertions/updates which would violate those. I have written code which finds automatically dependencies and if those are validated as RDs, then any insertion/update which would violate those would throw an exception. This way bugs can be automatically found by the system with their exact location in the error logs. – Lajos Arpad Dec 06 '15 at 06:39
  • 1
    You make a good point, but if you're trying to find FDs from only data and no extra information, then RDs don't exist. Everything is just an AD that may have to be scrapped later. – Mitchell Carroll Dec 18 '15 at 05:59
  • For dependencies which were not checked to be ADs or RDs, I use the term CDs, that is, Candidate Dependencies. When you find a dependency, you cannot know whether it is an RD or an AD. This separation can only be done knowing some extra information. So, if you find an FD, it might well be only apparent or a real pattern. It is a good approach to not assume whether they are AD or RD and leave the answer on the shoulders of further analysis. – Lajos Arpad Dec 18 '15 at 11:51
  • Can you elaborate on the meaning of the resultant table of your complex query please? – Ahmed Alhallag May 07 '23 at 22:11
  • @AhmedAlhallag sure thing. This query does a self-join on `mytable` and names any thusly found tuple as `t1` and `t2` and it searches, with an example of (A1, A2, A3) -> (B1, B2) dependency counter examples. A counter-example from an FD is a pair of tuples which are having a match for all determinant columns, but at least one of the dependent columns are mismatching. The count returns the number of counter-examples. – Lajos Arpad May 09 '23 at 15:12
3

In addition to what others have said, if an attribute (or a set of attributes) is a candidate key, then all the attributes must functionally depend on it.

  • A "functional dependency" A->B simply means that no two different values of B are ever related to the same A. Slightly more formal definition is given on Wikipedia, but that's essentially it.
  • Since a key must be unique, even if two tuples contain the same value of some attribute(s), the key values must be different nonetheless. So, different values can never relate to the same key value.

Since all attributes are functionally dependent on the key(s), if there is any other functional dependency, you automatically have a transitive dependency and a violation of the 3NF. So a "non-key" dependency can act as a red flag for spotting normalization errors.


You can think of it from the opposite direction as well: first figure out which functional dependencies make sense in your domain, then use them to identify which attributes could act as keys.

Branko Dimitrijevic
  • 50,809
  • 10
  • 93
  • 167
  • There are some valid cases when 3NF is not advisable. If you have many records and a few columns can be derived from a few other columns using some functionalities, then in many cases it is better to calculate them once and only read them later than calculate the value each time you are using them. – Lajos Arpad Oct 26 '13 at 05:06