76

Outside of the argument of whether or not NULLs should ever be used: I am responsible for an existing database that uses NULL to mean "missing or never entered" data. It is different from empty string, which means "a user set this value, and they selected 'empty'."

Another contractor on the project is firmly on the "NULLs do not exist for me; I never use NULL and nobody else should, either" side of the argument. However, what confuses me is that since the contractor's team DOES acknowledge the difference between "missing/never entered" and "intentionally empty or indicated by the user as unknown," they use a single character 'Z' throughout their code and stored procedures to represent "missing/never entered" with the same meaning as NULL throughout the rest of the database.

Although our shared customer has asked for this to be changed, and I have supported this request, the team cites this as "standard practice" among DBAs far more advanced than I; they are reluctant to change to use NULLs based on my ignorant request alone. So, can anyone help me overcome my ignorance? Is there any standard, or small group of individuals, or even a single loud voice among SQL experts which advocates the use of 'Z' in place of NULL?

Update

I have a response from the contractor to add. Here's what he said when the customer asked for the special values to be removed to allow NULL in columns with no data:

Basically, I designed the database to avoid NULLs whenever possible. Here is the rationale:

A NULL in a string [VARCHAR] field is never necessary because an empty (zero-length) string furnishes exactly the same information.

A NULL in an integer field (e.g., an ID value) can be handled by using a value that would never occur in the data (e.g, -1 for an integer IDENTITY field).

A NULL in a date field can easily cause complications in date calculations. For example, in logic that computes date differences, such as the difference in days between a [RecoveryDate] and an [OnsetDate], the logic will blow up if one or both dates are NULL -- unless an explicit allowance is made for both dates being NULL. That's extra work and extra handling. If "default" or "placeholder" dates are used for [RecoveryDate] and [OnsetDate] (e.g., "1/1/1900") , mathematical calculations might show "unusual" values -- but date logic will not blow up.

NULL handling has traditionally been an area where developers make mistakes in stored procedures.

In my 15 years as a DBA, I've found it best to avoid NULLs wherever possible.

This seems to validate the mostly negative reaction to this question. Instead of applying an accepted 6NF approach to designing out NULLs, special values are used to "avoid NULLs wherever possible." I posted this question with an open mind, and I am glad I learned more about the "NULLs are useful / NULLs are evil" debate, but I am now quite comfortable labeling the 'special values' approach to be complete nonsense.

an empty (zero-length) string furnishes exactly the same information.

No, it doesn't; in the existing database we are modifying, NULL means "never entered" and empty string means "entered as empty".

NULL handling has traditionally been an area where developers make mistakes in stored procedures.

Yes, but those mistakes have been made thousands of times by thousands of developers, and the lessons and caveats for avoiding those mistakes are known and documented. As has been mentioned here: whether you accept or reject NULLs, representation of missing values is a solved problem. There is no need to invent a new solution just because developers continue make easy-to-overcome (and easy-to-identify) mistakes.


As a footnote: I have been a DBE and developer for more than 20 years (which is certainly enough time for me to know the difference beetween a database engineer and a database administrator). Throughout my career I have always been in the "NULLs are useful" camp, though I was aware that several very smart people disagreed. I was extremely skeptical about the "special values" approach, but not well-versed enough in the academics of "How To Avoid NULL the Right Way" to make a firm stand. I always love learning new things—and I still have lots to learn after 20 years. Thanks to all who contributed to make this a useful discussion.

Community
  • 1
  • 1
Boris Nikolaevich
  • 1,451
  • 11
  • 21
  • 60
    NULL exists to enable [trinary logic](http://en.wikipedia.org/wiki/Three-valued_logic) which is necessary to preserve referential integrity in the absence of complete information - I would call complete and utter BS on any proclaimed DB expert who is adamantly against them! – gordy Jul 10 '11 at 00:12
  • 17
    Never heard of this practice at all. – Calvin Allen Jul 10 '11 at 00:14
  • 4
    There are some links to outspoken individuals who are against NULL and 3VL in general on the [wikipedia article for NULL](http://en.wikipedia.org/wiki/Null_%28SQL%29#Controversy) – gordy Jul 10 '11 at 00:28
  • 14
    Has the contractor proposed a surrogate NULL for numerical data too? – Andriy M Jul 10 '11 at 00:51
  • 14
    @Andriy: That's easy to solve, all the experts store numbers in character fields and cast (with Z-checks!) as needed. Oh wait, [I'm on the wrong site](http://thedailywtf.com/). – mu is too short Jul 10 '11 at 00:55
  • 8
    I'm late, but add me to the camp of "WTF?!". I think you need to give this contractor [one of these](http://www.despair.com/mis24x30prin.html) – OMG Ponies Jul 10 '11 at 01:10
  • 2
    Maybe he's done Verilog before and erroneously equates Z (don't care) with NULL? – hyperslug Jul 10 '11 at 02:46
  • 7
    You'll have a problem when they comes that a 'Z' is a valid value. And the day *will* come. – Petruza Jul 10 '11 at 03:34
  • 12
    I suspect that at one time, this contractor tried to execute `WHERE Column = NULL` and was confused as to why he didn't get any results. – Mike Caron Jul 10 '11 at 03:52
  • 2
    @Gordy Date and Darwen are *against* NULL and even if we don't agree with them I think it's fair to say they're competent database experts. Also (not database related) Hoare talk "Null References: The Billion Dollar Mistake" is an important expert's opinion since Hoare invented *"invented"* NULL in ALGOL. – Alex Jasmin Jul 10 '11 at 05:21
  • @Alexandre - Totally valid points, though this question was in the context of a production system which already uses NULL (for good or ill). – Boris Nikolaevich Jul 10 '11 at 05:30
  • 5
    @Boris, if you want to read something serious and well-grounded by people who are against `NULL`, may I recommend the short article **["How to handle missing information without using NULLs"](http://www.dcs.warwick.ac.uk/~hugh/TTM/Missing-info-without-nulls.pdf)** (link to a PDF from [The Third Manifesto homepage](http://www.thethirdmanifesto.com/)) -- Incidentally, I don't think `NULL`s are always bad, but replacing it with `'Z'` is almost definitely a bad idea. – stakx - no longer contributing Jul 10 '11 at 09:02
  • 4
    The fundamental question is *how* to store missing data (or, I guess, rather, how *not* to store it...or something...the language here gets in the day). Whether your preferred choice is to store `null` in the database or move to 6NF tables and simply not add records for missing data is immaterial; the fact is that *missing data is a solved problem*. You can do either, and both have their followers and detractors, but *nobody* (and I'm willing to make that generalization) who knows what they're doing will recommend a general-purpose magic value that is a wholesale replacement for `null`. – Adam Robinson Jul 11 '11 at 00:07
  • 2
    The only reason I can see to avoid using null is that you think you might forget to check for it. You are at least as likely to forget to check for "Z". So then you go to the fail early fail hard practice. If you are going to screw it up, might as well see it in development rather than having "Z"s show up instead of "Payee's Address" after it's deployed. – Bill K Jul 11 '11 at 07:57
  • 2
    Not fair downvoting the question just because you realized I am talking about you; I really was trying to understand where this 'standard' came from since I have not subscribed to the Never NULL design theory in the past, and therefore have not researched all the workarounds and common practices for avoiding NULL. – Boris Nikolaevich Jul 11 '11 at 19:03
  • 3
    I gave this analogy when describing this post to a family member: "Walking across the street can be dangerous; it has traditionally been an area where pedestrians can get hurt if they are not careful. In my 15 years as a pedestrian, I try to avoid walking across the street if at all possible. Therefore, I always _skip_ across the street instead." – Boris Nikolaevich Jul 13 '11 at 20:42
  • 2
    LOL :) Based on your update; Your contractor has grossly over played a legitimate concern. To the point that he has then grossly over simplified his mitigating actions. Placeholder values encode information implicitly rather than explicitly, require extra documentation, can collide with legitimate data, can cause silent failures in logic, require handling code anyway. The concern is valid, the solution is non-standard, naive and dangerous, and there are industrially Standard alternative approaches. – MatBailie Jul 14 '11 at 07:32
  • 1
    @BorisNikolaevich: Not *"Therefore, I always skip across the street instead."* but "Therefore, I always make a Z-bridge to cross a street instead." – ypercubeᵀᴹ Dec 01 '11 at 09:46
  • What happens as a user if "Z" is a valid entry of a requirement? – xQbert Dec 23 '11 at 10:57
  • So, when they do a left and right joins do they coalesce the "Z" values into their results to avoid the NULLS on the joined tables? Where does the insanity end? – xQbert Dec 23 '11 at 11:05

8 Answers8

105

Sack your contractor.

Okay, seriously, this isn't standard practice. This can be seen simply because all RDBMS that I have ever worked with implement NULL, logic for NULL, take account of NULL in foreign keys, have different behaviour for NULL in COUNT, etc, etc.

I would actually contend that using 'Z' or any other place holder is worse. You still require code to check for 'Z'. But you also need to document that 'Z' doesn't mean 'Z', it means something else. And you have to ensure that such documentation is read. And then what happens if 'Z' ever becomes a valid piece of data? (Such as a field for an initial?)

At a basic level, even without debating the validity of NULL vs 'Z', I would insist that the contractor conforms to standard practices that exist within your company, not his. Instituting his standard practice in an environment with an alternative standard practice will cause confusion, maintenance overheads, mis-understanding, and in the end increased costs and mistakes.


EDIT

There are cases where using an alternative to NULL is valid in my opinion. But only where doing so reduces code, rather than creating special cases which require accounting for.

I've used that for date bound data, for example. If data is valid between a start-date and an end-date, code can be simplified by not having NULL values. Instead a NULL start-date could be replaced with '01 Jan 1900' and a NULL end-date could be replaced with '31 Dec 2079'.

This still can change behaviour from what may be expected, and so should be used with care:

  • WHERE end-date IS NULL no longer give data that is still valid
  • You just created your own millennium bug
  • etc.

This is equivalent to reforming abstractions such that all properties can always have valid values. It is markedly different from implicitly encoding specific meaning into arbitrarily chosen values.

Still, sack the contractor.

Yi Jiang
  • 49,435
  • 16
  • 136
  • 136
MatBailie
  • 83,401
  • 18
  • 103
  • 137
  • 21
    +1 from me; spot on: "I would actually contend that using 'Z' or any other place holder is worse. You still require code to check for 'Z'. But you also need to document that 'Z' doesn't mean 'Z', it means something else. " – Mitch Wheat Jul 10 '11 at 00:17
  • 21
    What we need is a special value -- not NULL, since NULL is evil -- to represent missing data. Something that's different from all other values, maybe even from itself (since, two unknowns can't be equated simply because they are unknown). Some columns obviously wouldn't make sense with this value, and so it should be forbidden. To make things easy, we'd need special operators like IS UNKNOWN or IS NOT UNKNOWN. – Mike Caron Jul 10 '11 at 03:51
  • Brilliant suggestion, Mike, but I propose we use the term "Z" instead of UNKNOWN. As is, `... FROM FOO WHERE AGE IS Z`. – WW. Jul 10 '11 at 04:49
  • 5
    Contractors often have good advice from deep experience, but just because that *sometimes* happens, doesn't mean you have to follow the sheep over the recommended perilous cliff. Inform them that you are the master and owner of the database: development will be as specified: comply or die. – wallyk Jul 10 '11 at 07:44
  • 2
    If a user enters Z, then obviously you store ZZ. If they enter ZZ, you store ZZZ, and so on. This requires you to make all of your columns one character larger, but that shouldn't be a problem. – Chas. Owens Jul 10 '11 at 09:37
  • Also, name and shame your contractor so we can be sure that we never hire him either. – Tom O'Connor Jul 10 '11 at 13:11
  • 1
    @Tom - I'd be willing to do that for anyone who contacts me directly, but not on a public site. I'm sure there are people who feel the same way about my contracting and development skills, and I'd rather not have Karma's payback for publicly outing a developer with whom I disagree. Oh, and with whom most of the StackOverflow community disagrees, apparently. – Boris Nikolaevich Jul 11 '11 at 19:13
  • 2
    +1 from me in general - but especially for the edit where it can make sense to use fence post values for date ranges (min date / max date) because of how much code it can save - especially if you are having to compare/check for overlap in date ranges. In these cases, min date means "since always" and max date means "until forever" which is different from NULL meaning "not sure" or "don't care". – Joel Brown Jul 11 '11 at 23:44
26

This is easily one of the weirdest opinions I've ever heard. Using a magic value to represent "no data" rather than NULL means that every piece of code that you have will have to post-process the results to account/discard the "no-data"/"Z" values.

NULL is special because of the way that the database handles it in queries. For instance, take these two simple queries:

select * from mytable where name = 'bob';
select * from mytable where name != 'bob';

If name is ever NULL, it obviously won't show up in the first query's results. More importantly, neither will it show up in the second queries results. NULL doesn't match anything other than an explicit search for NULL, as in:

select * from mytable where name is NULL;

And what happens when the data could have Z as a valid value? Let's say you're storing someone's middle initial? Would Zachary Z Zonkas be lumped in with those people with no middle initial? Or would your contractor come up with yet another magic value to handle this?

Avoid magic values that require you to implement database features in code that the database is already fully capable of handling. This is a solved and well understood problem, and it may just be that your contractor never really grokked the notion of NULL and therefore avoids using it.

unpythonic
  • 4,020
  • 19
  • 20
22

If the domain allows missing values, then using NULL to represent 'undefined' is perfectly OK (that's what it is there for). The only downside is that code that consumes the data has to be written to check for NULLs. This is the way I've always done it.

I have never heard of (or seen in practice) the use of 'Z' to represent missing data. As to "the contractor cites this as 'standard practice' among DBAs", can he provide some evidence of that assertion? As @Dems mentioned, you also need to document that 'Z' doesn't mean 'Z': what about a MiddleInitial column?

Like Aaron Alton and many others, I believe that NULL values are an integral part of database design, and should be used where appropriate.

Mitch Wheat
  • 295,962
  • 43
  • 465
  • 541
  • 3
    I think the key here is "If the domain allows missing values..." It seems to me that there is a time and a place to espouse the use of NULLs, and a time & place to eschew them, and it takes some wisdom to know the difference. I get the feeling sometimes that when a junior DBE/DBA reads a caveat like, "NULL values can cause unexpected results in queries and calculations if you are do not account for their behavior," his knee-jerk reaction is to label all NULL usage as bad. Once it becomes a religiously-held opinion, it sticks with him for the rest of his career. – Boris Nikolaevich Jul 10 '11 at 01:02
  • 1
    Forgetting the WHERE clause on a DELETE or UPDATE can cause harm to your database => never use them. Either get the data right the first time or open the table in an editor and do it yourself. – MatBailie Jul 10 '11 at 01:12
  • Also, please note that OUTER joins yield NULLs and so should not be used. Ditto, ROLL UP. – MatBailie Jul 10 '11 at 03:16
  • 3
    Z is used to represent a GMT timezone in some standards. – Erick Robertson Jul 10 '11 at 04:17
  • 2
    @Erick, that's one more reason not to use Z to mean "no value." – Boris Nikolaevich Jul 10 '11 at 09:54
17

Even if you somehow manage to explain to all your current and future developers and DBAs about "Z" instead of NULL, and even if they code everything perfectly, you will still confuse the optimizer because it will not know that you've cooked this up.

Using a special value to represent NULL (which is already a special value to represent NULL) will result in skews in the data. e.g. So many things happened on 1-Jan-1900 that it will throw out the optimizer's ability to understand that actual range of dates that really are relevant to your application.

This is like a manager deciding: "Wearing a tie is bad for productivity, so we're all going to wear masking tape around our necks. Problem solved."

WW.
  • 23,793
  • 13
  • 94
  • 121
  • 10
    +1 just for the phrase "Using a special value to represent NULL (which is already a special value to represent NULL)" . . . – Mike Sherrill 'Cat Recall' Jul 10 '11 at 01:54
  • I thought that a bow-tie was exactly that, a neck tie which was replaced with masking-tape deemed more appropriate for the occasion... – Soren Jul 10 '11 at 05:52
10

I've never heard about the wide-spread use of 'Z' as a substitute for NULL.

(Incidentally, I'd not particularly like to work with a contractor who tells you in the face that they and other "advanced" DBAs are so much more knowledgeable and better than you.)

 +=================================+
 |  FavoriteLetters                |
 +=================================+
 |  Person      |  FavoriteLetter  |
 +--------------+------------------+
 |  'Anna'      |  'A'             |
 |  'Bob'       |  'B'             |
 |  'Claire'    |  'C'             |
 |  'Zaphod'    |  'Z'             |
 +---------------------------------+

How would your contractor interpret the data from the last row?

Probably he would choose a different "magic value" in this table to avoid collision with the real data 'Z'? Meaning you'd have to remember several magic values and also which one is used where... how is this better than having just one magic token NULL, and having to remember the three-valued logic rules (and pitfalls) that go with it? NULL at least is standardized, unlike your contractor's 'Z'.

I don't particularly like NULL either, but mindlessly substituting it with an actual value (or worse, with several actual values) everywhere is almost definitely worse than NULL.

Let me repeat my above comment here for better visibility: If you want to read something serious and well-grounded by people who are against NULL, I would recommend the short article "How to handle missing information without using NULLs" (links to a PDF from The Third Manifesto homepage).

stakx - no longer contributing
  • 83,039
  • 20
  • 168
  • 268
5

Nothing in principle requires nulls for correct database design. In fact there are plenty of databases designed without using null and there are plenty of very good database designers and whole development teams who design databases without using nulls. In general it's a good thing to be cautious about adding nulls to a database because they inevitably lead to incorrect or ambiguous results later on.

I've not heard of using Z being called "standard practice" as a placeholder value instead of nulls but I expect your contractor is referring to the concept of sentinel values in general, which are sometimes used in database design. However, a much more common and flexible way to avoid nulls without using "dummy" data is simply to design them out. Decompose the table such that each type of fact is recorded in a table that doesn't have "extra", unspecified attributes.

nvogel
  • 24,981
  • 1
  • 44
  • 82
  • 1
    I think the contractor literally means to use 'Z' for "don't know". – wallyk Jul 10 '11 at 07:37
  • Unfortunately, @wallyk is basically correct: this is not an academic or theoretical discussion; since I am a developer myself, I have been through the code and the stored procedures. The contractor is using the literal character 'Z' for missing/not entered values. (Values which are, in fact, "unknown but answered" are never NULL even in the current database design; both use empty string for text fields or the character 'U' for drop-down lists to indicate that the user did answer the question and the answer was "I don't know.") – Boris Nikolaevich Jul 10 '11 at 07:48
  • 1
    @dportas - I do recognize that correct database design does not require the use of nulls, but since I am in the "There is a time and a place to use NULL if you know how to do so correctly," the main purpose of the question was to understand whether the use of 'Z' in a good database design by someone of the "NoNULL" camp was either standard, common, or promoted by anyone. – Boris Nikolaevich Jul 10 '11 at 07:50
3

In reply to contractors comments

  • Empty string <> NULL
  • Empty string requires 2 bytes storage + an offset read
  • NULL uses null bitmap = quicker
  • IDENTITY doesn't always start at 1 (why waste half your range?)

The whole concept is flawed as per most other answers here

gbn
  • 422,506
  • 82
  • 585
  • 676
1

While I have never seen 'Z' as a magic value to represent null, I have seen 'X' used to represent a field that has not been filled in. That said, I have only ever seen this in one place, and my interface to it was not a database, but rather an XML file… so I would not be prepared to use this an argument for being common practice.

Note that we do have to handle the 'X' specially, and, as Dems mentioned, we do have to document it, and people have been confused by it. In our defence, this is forced on us by an external supplier, not something that we cooked up ourselves!

Paul Wagland
  • 27,756
  • 10
  • 52
  • 74
  • That would be very confusing for databases which store checkbox selections checked with a char field of 'X', not checked ' ' (space). I hope antimatter and matter don't get mixed into the same database.... – wallyk Jul 10 '11 at 07:39
  • I think this didn't get any votes because it's not directly related to the original Database Design question, but I at least have to say that even this "tangential" response only serves to underscore the absurdity of the contractor's approach. (Also, I think that "No Votes" should be replaced with "Z" from here on out.) – Boris Nikolaevich Jul 23 '11 at 04:58
  • The only answer to the question. – Pindatjuh Aug 03 '11 at 21:13