Data type for sql id field with limited set of possible values

Question

It is always recommended to use int type for id fields in SQL tables - I find some similar answers here, here and here.

But in all these questions id field is unique with many possible values like user_id in users table of a blog. Should I still use int for id field that have limited range of values like user type, that can be 'admin', 'user', 'editor'?

This is not question about users table, it mentioned only for example, in real databases on my work I often find such fields with 3-20 possible values that change very rarely, but used in many queries, views etc so we have a lot of SQL code like.

SOME_FIELD in (1,7, 12) 
SOME_FIELD != 2

We can also have a table with long names of such types to show in UI and foreign key to this table. It is not a natural key like USA citizen's social security number. There is usially a lot of data in one row with many fields like varchar(100). I think it will be better with

SOME_FIELD in ('Draft', 'Auto-Draft' , 'Trash') 
SOME_FIELD != 'Published'

But it directly contradict to all answers from simular questions that I find here. I tried to examine some popular application - Wordpress database has varchar fields for post_status or post_type but int for user_status.

Should I still use int type for such fields? Or it is a question of personal taste?

To make question more clear: the biggest problem with varchar key, that was mentioned in answers to other questions is perfomance problems with joins and other operations. Will these problems decrease with limited number of varchar values, in one language, with small limited size like varchar(10) and values, that modified rarely and only by programmers, not by users, so answers from here or here or using IDENTITY/AUTO_INCREMENT are not relevant? Is there any research on that topic or experimental results from experienced developers?

It is not question just about MySQL or Wordpress, I also use MS SQL Server and data about other RDBMS may be interesting.

I think one thing that could affect the design pattern would be the size of the table (number of rows) and the performance hit you could get on a join. Denormalizing is common, but should be done as a last resort. Just think if you wish to present the user with a distinct list of post_statusses, you would have to select distinct over all the posts. — Adriaan Stander, Jan 23 '14 at 09:20
If I need to show such data to user I usially make some table with long and pretty visible names like "Document ready to send to other organisation" and then I can make foreign key to this table. But I still have SQL code like 'SOME_FIELD in (1,7, 12) and SOME_OTHER_FIELD != 3' — Vedomir, Jan 23 '14 at 10:37

score 0 · Answer 1 · answered Jan 23 '14 at 09:20

If you know the table is only going to be a small size then you can use smallint and tinyint here's some more detail about their max values and sizes.

The main reason for wanting to use ints is to make joining easier and to keep the key sizes down (an int takes up a lot less space than a typical nvarchar and doesn't have collocation issues)

I would caution you - using a smallint where a tinyint may have done is usually a lot less of an problem than the challenges of changing the data type of a PK column later on. If in doubt er on the side of caution!

Data type for sql id field with limited set of possible values

1 Answers1