-2

I am concerned about which candidate key to select as the primary key among the candidate keys.

Assume that using mysql database(innoDB). Suppose we have a unique value, Student Number, and a unique value, ID Number(eg Social Security Number).

Student ID number and ID number can each be a candidate key. In this case, what value should I set as the primary key even considering auto-increment new column?

My guess is that innoDB(mysql) use primary key to create the clustering index. So, is it right to use a column where I need to find a specific range, since it has the advantage of being able to find a range?

Thank you!!

rabong
  • 139
  • 1
  • 7
  • *which candidate key to select as the primary key among the candidate keys* Primary key 1) is unique, duplicates are not possible even in theory 2) does not allow NULLs. And then just focus on common sense. – Akina May 25 '22 at 17:25
  • *Student ID number and ID number can each be a candidate key. In this case, what value should I set as the primary key even considering auto-increment new column?* Imagine that you insert a row for a student. Is it possible: #1) to skip Student Number temporarily and insert the value later? #2) to skip ID Number temporarily and insert the value later? If true for some column than it cannot be PK. For two "true" - use synthetic PK. – Akina May 25 '22 at 17:28
  • You are focusing too much on inconsequent performance issues, and not on correct data modeling. You can always add good indexes later, but make sure the data modeling is correct, first. – The Impaler May 25 '22 at 17:53
  • SSNs are a bad idea -- You will be in hot water if your dataset gets hacked/stolen. – Rick James May 25 '22 at 21:26

1 Answers1

1

First, you should be aware that the US Social Security Number is not unique.

You're correct that InnoDB always treats the primary key as the clustering index. You don't have to guess, it's documented in the manual: https://dev.mysql.com/doc/refman/8.0/en/innodb-index-types.html

You don't necessarily need to use the primary key to find a specific range. You could use a secondary index as well. You could even use a non-indexed column, but it would result in a table-scan which causes poor performance unless the table is very small.

Given the choice between searching the clustered index versus a secondary index, it's a little bit more optimized to search the clustered index.

There are exceptions to the guideline, and we can't know if your query is one of those exceptions because you haven't described any of your queries.

This brings up a broader point: you can't choose the best optimization strategy without knowing the specific queries you need to optimize.

Bill Karwin
  • 538,548
  • 86
  • 673
  • 828
  • 1
    I would also add that if the OP uses US social security number (or other national ID numbers) as a primary key, then it would also appear in other tables as foreign key. Such ID numbers are usually considered as sensitive information and there is a fair chance that is subject to various data protection regulations. Not sure I want to complicate things by using such a data as foreign keys across several tables in my application! – Shadow May 25 '22 at 18:11
  • Yes, good point. – Bill Karwin May 25 '22 at 18:20