10

We generally prefer to have all our varchar/nvarchar columns non-nullable with a empty string ('') as a default value. Someone on the team suggested that nullable is better because:

A query like this:

Select * From MyTable Where MyColumn IS NOT NULL

is faster than this:

Select * From MyTable Where MyColumn == ''

Anyone have any experience to validate whether this is true?

ssapkota
  • 3,262
  • 19
  • 30
Randy Minder
  • 47,200
  • 49
  • 204
  • 358
  • At least under Oracle, an empty string is also treated as `NULL`. – zneak Jun 19 '10 at 15:11
  • My experience: not under MySQL. – MvanGeest Jun 19 '10 at 15:14
  • 2
    Your examples aren't the same. Either the first one should be `MyColumn IS NULL`, or the second one should be `MyColumn <> ''`. – Cheran Shunmugavel Jun 19 '10 at 15:17
  • An empty string `IS NOT NULL`, and shouldn't be. `NULL` means "no value", while an empty string... is just an empty string. MySQL(and any sane SQL server) treats `NULL` differently from any other data type, any operation on a null value will return null (including `NULL == NULL`). – Mewp Jun 19 '10 at 15:21

5 Answers5

13

On some platforms (and even versions), this is going to depend on how NULLs are indexed.

My basic rule of thumb for NULLs is:

  1. Don't allow NULLs until justified

  2. Don't allow NULLs unless the data can really be unknown

A good example of this is modeling address lines. If you have an AddressLine1 and AddressLine2, what does it mean for the first to have data and the second to be NULL? It seems to me, you either know the address or not, and having partial NULLs in a set of data just asks for trouble when somebody concatenates them and gets NULL (ANSI behavior). You might solve this with allowing NULLs and adding a check constraint - either all the Address information is NULL or none is.

Similar thing with middle initial/name. Some people don't have one. Is this different from it being unknown and do you care?

ALso, date of death - what does NULL mean? Not dead? Unknown date of death? Many times a single column is not sufficient to encode knowledge in a domain.

So to me, whether to allow NULLs would depend very much on the semantics of the data first - performance is going to be second, because having data misinterpreted (potentially by many different people) is usually a far more expensive problem than performance.

It might seem like a little thing (in SQL Server the implementation is a bitmask stored with the row), but only allowing NULLs after justification seems to me to work best. It catches things early in development, forces you to address assumptions and understand your problem domain.

Cade Roux
  • 88,164
  • 40
  • 182
  • 265
  • As for date of death: NULL would mean that there is no known date. In this case, usage of null is justified, because you could want to find, for example, the oldest date recorded, or count dead people (NULL is not counted). The same thing applies to a middle name, if you'll ever want to know how many people in your database have those. – Mewp Jun 19 '10 at 16:04
  • 2
    @Mewp You can't count people by COUNT(DtOfDeath), there are always dead people where you know they are dead but you don't know the date of death (or it's a possible range - as we know from our experience in New Orleans after Katrina). My point is that you have to think how you want to use the data and what you know in order to model the problem domain successfully. – Cade Roux Jun 19 '10 at 16:13
6

If you want to know that there is no value, use NULL.

As for speed, IS NULL should be faster, because it doesn't use string comparison.

Mewp
  • 4,715
  • 1
  • 21
  • 24
4

If you need NULL, use NULL. Ditto empty string.

As for performance, "it depends"

If you have varchar, you are storing an actual value in the row for the length. If you have char, then you store the actual length. NULL won't be stored in-row depending on the engine (NULL bitmap for SQL Server for example).

This means IS NULL is quicker, query for query, but it could add COALESCE/NULLIF/ISNULL complexity.

So, your colleague is partially correct but may not appreciate it fully.

Blindly using empty string is use of a sentinel value rather then working through the NULL semantic issue

FWIW and personally:

  • I would tend to use NULL but don't always. I like to avoid dates like 31 Dec 9999 which is where NULL avoidance leads you.

  • From Cade Roux's answer... I also find that discussions about "Is date of death nullable" pointless. For an field, in practical terms, either there is a value or there isn't.

  • Sentinel values are worse then NULLs. Magic numbers. anyone?

gbn
  • 422,506
  • 82
  • 585
  • 676
3

Tell that guy on your team to get his prematurely optimizin' head out of his ass! (But in a nice way).

Developers like that can be poison to the team, full of low-level optimization myths, all of which may be true or have been true at one point in time for some specific vendor or query pattern, or possibly only true in theory but never true in practice. Acting upon these myths is a costly waste of time, and can destroy an otherwise good design.

He probably means well and wants to contribute his knowledge to the team. Unfortunately, he is wrong. Not wrong in the sense of whether a benchmark will prove his statement correct or incorrect. He's wrong in the sense that this is not how you design a database. The question of whether to make a field NULL-able is a question about domain of the data for the purposes of defining the type of the field. It should be answered in terms of what it means for the field to have no value.

John
  • 207
  • 2
  • 2
1

In a nutshell, NULL = UNKNOWN!.. Which means (using date of death example) that the entity could be 1)alive, 2)dead but date of death is not known, or 3)unknown if entity is dead or alive. For numeric columns I always default them to 0 (ZERO) because somewhere along the line you may have to perform aggregate calculations and NULL + 123 = NULL. For alphanumerics I use NULL since its least expensive performance-wise and easier to say '...where a IS NULL' than saying '...where a = "" '. Using '...where a = " "[space]' is not a good idea because [space] is not a NULL! For dates, if you have to leave a date column NULL, you may want to add a status indicator column which, in the above example, A=Alive, D=Dead, Q=Dead, date of death not known, N=Alive or Dead is unknown.

Joe R.
  • 2,032
  • 4
  • 36
  • 72