12

I have large DB (many millions of rows) and I'm trying to make the best choices for datatypes for 2 fields. Most everything I have made varchar or INT. However, 2 fields I'm wondering if Enum is the best way.

Field 1 First field is gender, My data is currently either 'Male' or 'Female' or it could be blank. I initially set it up like this:

GENDER VARCHAR(6) NOT NULL

Is this the best way, or would it be better to set it up as:

GENDER ENUM ('Male', 'Female') NOT NULL

And do I need to make it NOT NULL to allow for the blank, or do I need to add the blank, i.e.

GENDER ENUM ('Male', 'Female', '') NOT NULL

Not to mention, I'm considering converting the entire field to just M or F.

Field 2: I have pretty much the same things to consider, except for the state field, which could include 52 values (50 states, DC, plus blank).

I guess the biggest question is - Is all this Enum stuff worth it? My DB has many millions of rows, so everything is a factor, but should I just be using VARCHAR(2) for the states instead of ENUM.

Johan
  • 74,508
  • 24
  • 191
  • 319
Kevin
  • 1,685
  • 7
  • 28
  • 55

2 Answers2

11

The rule of thumb I usually apply to such cases is NOT to use MySQL ENUMs. Using them creates maintenance issues, especially around adding/removing/renaming some of the values. In InnoDB, renaming and removing an enum value is heavy on big tables. Adding a value isn't (as long as you don't add it in the middle).

As you probably DO want to keep this column in context, and not to allow any value out of this context, the best way IMHO is to use INT, and connect it as a foreign key to a values table (columns id, value).

You will be able to add and rename values in this table easily, and before you remove a value the FK will enforce handling any existing records in the main table which have this value.

To read the data easily, all you need is a simple JOIN.

Note: since genders are pretty final, you may want to leave it as VARCHAR(1) or use an ENUM like Johan suggests, but who knows? You may want to support transgenders and androgyny in the future. Not kidding.

Galz
  • 6,713
  • 4
  • 33
  • 39
  • normally I'd agree, but for gender 2, maybe 3 absolute maximum 7 values I think it is overkill. – Johan Oct 05 '11 at 21:35
  • 1
    @Johan: 7 is a very small number for my alien space battle game. – ypercubeᵀᴹ Oct 05 '11 at 21:38
  • 1
    Jokes aside, `CHAR(1)` is a valid approach. It even lets you add a foreign key to a reference table (with 2, 3, 7 rows whatever) and you don't have to mess with the `ENUM` when you need to add a new gender (or an `'Unknown'` value). – ypercubeᵀᴹ Oct 05 '11 at 21:40
  • One of the things that concerns me is importing CSV files in the future. I don't want to run into issues with blank fields that might interrupt an import. – Kevin Oct 05 '11 at 21:42
  • Also, I'm so in the details right now, I'm dealing gender as in male/female or unknown (blank). I hadn't even considered the 'neither' or 'both'. – Kevin Oct 05 '11 at 21:44
  • @Kevin, if you allow null that will not be an issue. – Johan Oct 05 '11 at 21:44
  • 1
    @ypercube, it's almost bedtime, how am I supposed to get **that** picture out of my head? – Johan Oct 05 '11 at 21:45
  • Char(1) looks like a perfect solution. Thanks. – Kevin Oct 05 '11 at 21:45
  • `CHAR(1)` and `TINYINT` has the same goods (and bads) that Galz described (1 byte, FKs, a reference table, integrity constraint, renaming). I just prefer seeing `WHERE gender='F'` to `WHERE gender=2`. – ypercubeᵀᴹ Oct 05 '11 at 21:49
  • @ypercube - I see your point, but add a simple JOIN and you get WHERE ref_table.value = 'F'. Since this is indeed less elegant I agree that for genders it may be an overkill, but certainly not for states. – Galz Oct 05 '11 at 21:53
  • I totally concur with the lookup table / foreign key idea. I have databases with hundreds of millions of rows in them, and an ALTER TABLE on a column to add another value to an enum locks the table up for flippin' ages. Lookup tables are the way to go for me. – Kenny Oct 05 '11 at 21:59
  • @Kenny, I'm starting over with a massive database for just that reason. I'm at a trial and error stage to see if the DB works faster with some modifications. – Kevin Oct 06 '11 at 05:53
7

If you want to have a value for no value entered, use null that's what null is designed for!

If you want to specify something in between male and female, use

ENUM('male','female','other') NULL;

Note that an enum does not store the literal text value in the column.
male is stored as 1, female as 2 and other as 3 etc.
This means that it is much more efficient than varchar.

If you are struggling with null in your selects, note that you can use the ifnull or coalesce functions to replace the null with something more useful.

SELECT IFNULL(gender,'other') as gender FROM people;
-- or the identical statement
SELECT COALESCE(gender,'other') as gender FROM people;
Awaterujin
  • 63
  • 1
  • 5
Johan
  • 74,508
  • 24
  • 191
  • 319