20

I'm just stepping into a project and it has a fairly large database backend. I've started digging through this database and 95% of the fields are nullable.

Is this normal practice in the database world? I'm just a lowly programmer, not a DBA but I would think you would want to keep nullable fields to a minimum, only where they make sense.

Is it a "code smell" if most columns are nullable?

  • 2
    If the DB schema is still evolving and new columns are being added, it may be easier to make the new columns null in the beginning which can mean a lot of nulls if there are a lot of columns added to tables within the DB. That seems to be where I'd see them popping up. – JB King Jun 23 '09 at 21:37
  • It's also a [stupidly-hard task](https://dba.stackexchange.com/questions/208530) to introduce new columns to a table as NOT NULL if you can't rely on defaults. – Elaskanator Aug 29 '18 at 16:33

17 Answers17

18

Default values are typically the exception and NULLs are the norm, in my experience.

True, nulls are annoying.

It's also extremely useful because null is the best indicator of "NO VALUE". A concrete default value is very misleading, and you can lose information or introduce confusion down the road.

hythlodayr
  • 2,377
  • 15
  • 23
  • 2
    the OP doesn't say if they are using MySQL. The MySQL manual says: "Declare columns to be NOT NULL if possible. It makes SQL operations faster, by enabling better use of indexes and eliminating overhead for testing whether each value is NULL. You also save some storage space, one bit per column. ..." http://dev.mysql.com/doc/refman/5.5/en/data-size.html – Greg K Sep 05 '13 at 13:03
14

Anyone who has developed a data entry application knows how common it is for some of the fields to be unknown at the time of entry -- even for columns that are business-critical, to address @Chris McCall's answer.

However, a "code smell" is merely an indicator that something might be coded in a sloppy way. You use smells to identify things that need more investigation, not necessarily things that must be changed.

So yes, if you see nullable columns so consistently, you're right to be suspicious. It might indicate that someone was being lazy, or afraid to declare NOT NULL columns unequivocally. You can justify doing your own analysis.

Bill Karwin
  • 538,548
  • 86
  • 673
  • 828
11

I'm of the Extreme NO camp: I avoid NULLs all the time. Putting aside fundamental considerations about what they actually mean (because talk to different people, you'll get different answers such as "no value", "unknown value", "missing", "my ginger cat called Null"), the worst problem NULLs cause is that they often ruin your queries in mysterious ways.

I've lost count of the number of times I've had to debug someone's query (okay, maybe 9) and traced the problem to a join against a NULL. If your code needs ISNULL to repair joins then the chances are you've also lost index applicability and performance with it.

If you do have to store a "missing/unknown/null/cat" value (and it's something I prefer to avoid), it is better to be explicit about it.

Those skilled at NULLs may disagree. NULL use tends to split SQL crowds down the middle.

In my experience, heavy NULL use has been positively correlated with database abuse but I wouldn't carve this into stone tablets as some Law of Nature. My experience is just my experience.

EDIT: Additional thought. It is possible that those who are anti-null racists like myself are more excited by normalization than those who are pro-NULL. I don't think rabid normalizers would be too happy with ragged edges on their tables that can take NULLs. Lots of nulls may indicate that the the database developers are not into heavy normalisation. So rather than NULL suggesting code is "bad" it may alternatively suggest the philosophical position of the developers on normalisation. Maybe this is reaching. Just a thought.

Joel Goodwin
  • 5,026
  • 27
  • 30
  • 1
    How do you feel about the integer value zero, which results in division-by-zero errors when used improperly? Does this mean we should disallow the use of zero? – Bill Karwin Jun 23 '09 at 22:27
  • 1
    Also, your cat example is spurious. The string 'Null' is not the same as a SQL NULL. But it does make me wonder how you'd make a poster searching for that cat if it went missing. ;-) – Bill Karwin Jun 23 '09 at 22:30
  • If it's Schrodinger's cat, then it can be alive or dead, so makes it relevant in a sort of "what is it?" way =) Div by zero errors are consistent, in your face and pretty obvious; it's a fundamental arithmetic problem that's you have to live with. NULL tends to be stealthy like a ninja, you're not sure you've got a NULL problem until you've been had - plus NULL join behaviour can be inconsistent across platforms. I would argue that it isn't a fundamental concept with a definitive meaning and standard set of behaviours, unlike div by zero. (And 'Null' != SQL NULL has caught many a dev out) – Joel Goodwin Jun 23 '09 at 22:58
  • Re: the divide by zero comment, that's an application issue, not a database one. I personally am a big fan of 0 value for an int column, rather than null. Dividing by null can be its own issue. – Chris Jun 23 '09 at 22:59
  • 1
    My point is that zero and NULL both deserves some special handling. ANSI SQL is clear on the standard semantics of NULL (Oracle's VARCHAR2 behavior notwithstanding). And anyone who can't tell the difference between NULL and 'Null' is not paying attention in class! – Bill Karwin Jun 24 '09 at 04:04
  • I would not simply equate the two, these are completely different levels of special. Zero: divide by zero and query fails. NULL: It is not COUNTed as a value; put it amongst some operations and it acts as a super-zero, wiping out the entire operation with NULL; it cannot be used in equality, nor inequality - "val <> 3" will not return the val=NULL row) unless you are explicit with IS NULL. This leads to the surprise that SELECT * FROM data WHERE (val=3) AND (val<>3) does not return all rows. More on wikipedia, http://en.wikipedia.org/wiki/Null_(SQL). – Joel Goodwin Jun 24 '09 at 10:50
  • 1
    I see your point, but I view NULL as a very similar case to zero, with respect to the fact that you have to check for the presence of NULL when writing certain types of expressions. I do understand the semantics of NULL, but I maintain that it's a useful and valid part of the language when used correctly. Just because some developers don't understand how it works doesn't mean we should make blanket rules against using it. – Bill Karwin Jun 25 '09 at 16:36
  • 2
    I totally agree. As I pointed out - those skilled at using NULLs would probably disagree with my stance. Putting aside arguments about relational fundamentals, I would summarise my position as this: that the NULL is so exceptional in operation that it's more likely to cause harm than normalising them out of the solution. In over a decade of SQL work, I have met many who used NULLs, but few who were aware of it's subtleties [maybe it's a symptom of my sector - but that's another topic]. Bill, I count you in as being aware =) – Joel Goodwin Jun 25 '09 at 18:13
  • Thanks. Yeah, I agree 100% that a lot of developers don't understand how to use NULL correctly. I'm writing a book on "SQL Antipatterns" and I have a chapter about misuse of NULL -- either treating NULL as an ordinary value, or treating an ordinary value as NULL. – Bill Karwin Jun 25 '09 at 19:06
7

Don't know if I consider it always a bad thing, but if the columns are being added because a single record (or maybe a few) need to have values while most don't, then it indicates a pretty flat table structure. If you're seeing column names like "addr1", "addr2", "addr3", then it stinks!

I would bet that most of the columns you have could be removed and represented in other tables. You could find the "non-null" ones through a foreign key relationship. This will increase the joins that you'll be doing, but it could be more preformant that doing a "where not col1 is null".

Todd R
  • 18,236
  • 8
  • 31
  • 39
  • 1
    How would you store the various lines of an address other than columns named addr1, addr2, addr3? (OR were you referring 3 separate complete addresses?) Addresses are one of the standard example cases for nulls. Some addresses have 2 lines, some have 6. – jmucchiello Jun 23 '09 at 20:32
  • 2
    I interpreted Addr1 through AddrX to be placeholders for mailingAddress, physicalAddress, workAddress, xmasAddress, etc. Otherwise, it'd be AddrLine1, AddrLine2. – p.campbell Jun 23 '09 at 21:19
  • Yeah, maybe address was a bad example - probably should have used phone numbers. What typically shows up (in bad schemas) is "homeaddr", "workaddr", "vacationaddr", "otheraddr", "otheraddr2", etc., all because one record needed "workaddr", another needed "vacationaddr" (without "workaddr"), and so on. AddrLine1 and AddrLine2 are fine. – Todd R Jun 24 '09 at 13:19
7

I think nullable columns should be avoided. Wherever the semantics of the domain make it possible to use a value that clearly indicates missing data, it should be used instead of NULL.

For instance, let's imagine a table that contains a Comment field. Most developers would place a NULL here to indicate that there's no data in the column. (And, hopefully, a check constraint that disallows zero-length strings so that we have a well-known "value" to indicate the lack of a value.) My approach is usually the opposite. The Comment column is NOT NULL and a zero-length string indicates the lack of a value. (I use a check constraint to ensure that the zero-length string is really a zero-length string, and not whitespace.)

So, why would I do this? Two reasons:

  1. NULLs require special logic in SQL, and this technique avoids that.
  2. Many client-side libraries have special values to indicate NULL. For instance, if you use Microsoft's ADO.NET, the constant DBNull.Value indicates a NULL, and you have to test for that. Using a zero-length string on a NOT NULL column obviates the need.

Despite all of this, there are many circumstances in which NULLs are fine. In fact, I have no objection to their use in the scenario above, although it wouldn't be my preferred way.

Whatever you do, be kind to those who will use your tables. Be consistent. Allow them to SELECT with confidence. Let me explain what I mean by this. I recently worked on a project whose database was not designed by me. Nearly every column was nullable and had no constraints. There was no consistency about what represented the absence of a value. It could be NULL, a zero-length string, or even a bunch of spaces, and often was. (How that soup of values got there, I don't know.)

Imagine the ugly code a developer has to write to find all of those records with a missing Comment field in this scenario:

SELECT * FROM Foo WHERE LEN(ISNULL(Comment, '')) = 0

Amazingly there are developers who regard this as perfectly acceptable, even normal, despite possible performance implications. Better would be:

SELECT * FROM Foo WHERE Comment IS NULL

Or

SELECT * FROM Foo WHERE Comment = ''

If your table is properly designed, the above two SQL statements can be relied upon to produce quality data.

Gregory Higley
  • 15,923
  • 9
  • 67
  • 96
  • 7
    I have to disagree. NULL means unknown, regardless of the data type of the column. It should always be used to mean unknown, and magic values like empty string should never be used to mean unknown. – John Saunders Jun 23 '09 at 21:37
  • On the other hand, if we know that the user elected not to leave a comment, why would we use something that means "unknown" to represent that knowledge? – Joel Mueller Jun 23 '09 at 21:50
  • 1
    @john-saunders It depends on the domain. A zero-length string in a comment field can represent a very clearly-defined, extremely well-known value: "no comment". This is quite different from "unknown". However, these are semantic games that even Dr. Codd played. He later came up with several alternatives to NULL to indicate things like unknown, missing, etc. The important thing here is to *be consistent*. – Gregory Higley Jun 23 '09 at 21:56
  • Because the value of the comment is unknown. – John Saunders Jun 23 '09 at 21:57
  • This has nothing to do with the problem domain. If the user left a zero-length comment, then it's a zero-length comment. If no comment was left, then there is no known value for the comment. If the solution chooses to treat these two separate concepts the same, then that's fine, but they're still two separate concepts. – John Saunders Jun 23 '09 at 21:58
  • Perhaps the problem here is that I foolishly (and loosely) used the phrase "missing data" in my first paragraph, which is not exactly what I meant. It's more nuanced than that, and again, it depends upon the semantics of the domain, but it's something I try not to get too pedantic about. I should stress that I *do* use nullable columns. There are many situations in which they are completely appropriate. It depends on the meaning and type of the data you are storing. – Gregory Higley Jun 23 '09 at 21:59
  • 2
    For the comment example, a null value should signify "We never asked the user for a comment", while a "" value should signify "We presented the user a form with an optional comment box, and they submitted the form without putting anything in it". In most cases I can envision there being a "comments" column, the latter situation is the only likely one. – Brian Schroth Oct 26 '09 at 18:22
5

In short, I would say yes, this is probably a code smell.

Whether a column is nullable or not is very important and should be determined carefully. The question should be assessed for every column. I am not a believer in a single "best practices" default for NULL. The "best practice" for me is to address the nullability thoroughly during the design and/or refactoring of the table.

To start with, none of your primary key columns are going to be nullable. Then, I strongly lean towards NOT NULL for anything which is a foreign key.

Some other things I consider:

Criteria where NULL should be strongly avoided: money columns - is there really a possibility that this amount will be unknown?

Criteria where NULL can be justified most frequently: datetime columns - there are no reserved dates, so NULL is effectively your best option

Other data types: char/varchar columns - for codes/identifiers - NOT NULL almost exclusively int columns - mostly NOT NULL unless it's something like "number of children" where you want to distinguish an unknown response.

Cade Roux
  • 88,164
  • 40
  • 182
  • 265
2

No, whether or not a field should be nullable is a data concept and can't be a code smell. Whether or not NULLs are annoying to code has nothing to do with the usefulness of having nullable data fields.

jmucchiello
  • 18,754
  • 7
  • 41
  • 61
2

They are a (very common) smell, I'm afraid. Look up C.J. Date writings on the topic.

Bruno Martinez
  • 2,850
  • 2
  • 39
  • 47
  • Really? C. J. Date thinks that NULLs are not a legitimate part of the relational model, and even if they were, are implemented wrong in SQL. So his writings on this topic might be considered to be an extreme opinion. – Bill Karwin Jun 23 '09 at 20:31
  • 4
    They might be considered an extreme opinion if it weren't for the fact that he's right, and has a pretty much airtight case for why he is right (at least as far as the "[nulls] are implemented wrong in SQL" part goes). A type generator like Optional or Maybe (or whatever you want to call it) is a useful thing, but when Missing = Missing evaluates to a magical third boolean called Unknown with all kinds of wacky and inconsistently properties, that's a problem for everyone (including the optimizer). – Doug McClean Jun 23 '09 at 22:51
2

In my experience, it is a problem when Null and Not Null don't match up to the required field /not required field.

It is in the realm of possibility that those really are all optional fields. If you find in the business tier or the UI tier that those fields are required, then I think this means the data model has drifted away from the business object model and is a sign of overly conservative DB change policies, or oversight.

If you run a sample data generator on your data, and then try to load the data that is valid according to SQL, you would find out right away if the rules match up.

MatthewMartin
  • 32,326
  • 33
  • 105
  • 164
1

As a best practice, if a column shouldn't be nullable, then it should be marked as such. However, I don't believe in going completely insane with things like this.

Cody C
  • 4,757
  • 3
  • 29
  • 36
1

I think so. If you don't need the data, then it's not important to your business. If it is important to your business, it should be required.

Chris McCall
  • 10,317
  • 8
  • 49
  • 80
  • Sure, a credit card number is required before a user can buy anything (for example) but they should still be allowed to save other attributes, and then add the credit card number later. If you block them from entering *any* data because they don't have the required fields, that's just going to make them upset. – Bill Karwin Jun 23 '09 at 22:34
  • 1
    Which is why those things don't belong in the same table, not why credit card number should be nullable in the table of credit card orders, right? – Doug McClean Jun 23 '09 at 22:44
  • It's just an example. The point is that there may be attributes of a given table that are important for your business, but not for data integrity. – Bill Karwin Jun 24 '09 at 03:58
  • 1
    I've yet to see a working example where this is true and not a design artifact. – Chris McCall Jun 24 '09 at 15:15
1

This is all completely dependent on the scope and requirements of the project. I wouldn't use number of nullable fields alone as a metric for poorly written or designed code. Have a look at the business domain, if there are many non nullable fields represented there that are nullable in the database, then you have some issues.

Matthew Vines
  • 27,253
  • 7
  • 76
  • 97
0

That seems like a lot, it probably means you should at least investigate. Note that if this is mature product with a lot of data, convincing anyone to change the structure may be difficult. The earlier in the design phase you catch something like this the easier it is to fix up all the related code to adjust for the change.

Whether it is bad that they used the nulls would depend on whether the columns allowing nulls look as if they should be related tables (home phone, cell phone, business phone etc which should be in aspearate phone table) or if they look like things that might not be applicable to all records (possibly could bea related table with a one-to-one relationship)or might not be known at the time of data entry (probably ok). I would also check to see if they in fact alwAys do have a value (then you might be able to change to not null if the information is genuinely required by the busniess logic). If you have a few records with null

HLGEM
  • 94,695
  • 15
  • 113
  • 186
0

One of the many ways to map inheritance (e.g. c# objects) to a database is to create a table for the class at the top of the hierarchy, then add the columns for all the other classes. The columns have to be nullable for when an object of a different subclass is stored in the database. This is called Single-table inheritance mapping (or Map Hierarchy To A Single Table) and is a standard design pattern.

A side effect of Single-table inheritance mapping is that most columns are nullable.


Also in Oracle an empty string (0 length) is considered to be null, therefore in some companies all strings columns are made nullable even on SqlServer. (just because the first customer wants the software on SqlServer does not mean the 2nd customer does not have a Oracle DBA that will not let SqlServer onto there network)

Ian Ringrose
  • 51,220
  • 55
  • 213
  • 317
  • Yet, when it gets to the stage that most columns are null, I think it's time to consider mapping to multiple tables. That will make it possible to enforce some constraints on the derived tables. – John Saunders Jun 23 '09 at 21:39
  • But changing the ORM system the application uses can be a big risk. In the end, the database is there to serve the application not the other way round. (I am a C# programmer not a DBA after all) – Ian Ringrose Jun 23 '09 at 21:57
  • Who said anything about changing the ORM system? Just change how the ORM maps to the underlying database. Besides, this can allow additional constraints to be enforced, improving the quality of the system as a whole. – John Saunders Jun 23 '09 at 22:37
  • Assuming that the ORM system that was chosen 5 years ago lets you – Ian Ringrose Jun 24 '09 at 07:37
  • And as a Developer that is very antiDBA I would say constraints on those tables are inherently bad in the first place. I would fully agree with Ian, the database is meant to serve the application to often I've seen the database dictating the application. That IS wrong. – Chris Marisic Oct 26 '09 at 18:08
0

As mentioned by others, front-facing data entry should allow omittance of many fields. This is complicated by how people interpret the trinary nature of NULL (e.g. empty versus missing).

As such, I am only answering about one facet of database design: foreign keys.

In general, foreign keys do not suffer from the arbitrary nature of business logic, therefore seeing these columns allowing NULL is definitely a code smell.

For example, if you had a [Person] table, in no situation would you ever have a [Person].[FatherID] value that was NULL intentionally.

For a large database, an attempt to save NULL to such a column is likely to occur at some point due to the inevitability of bugs, which would have been brought to light much sooner by having a NOT NULL constraint. So for version 1 or a table, you should never allow nullable columns without justification.

But things get much trickier in an evolving code base, especially one that is staying online and thus requires migration scripting to upgrade. In particular, you may find nullable columns added to tables later on, because properly adding them as non-nullable can be quite hard depending on your integration process.

Furthermore, visual table designers (such as in SQL Server Management Studio and Visual Studio) default to allowing NULL so it could simply be a matter of inadequate code review.


I don't want to attempt a proper answer for flag (i.e. boolean) columns, but I strongly suggest considering how they can be implemented without allowing NULL, since I have usually found ways to avoid nullability even under the constraints of business logic.

Elaskanator
  • 1,135
  • 10
  • 28
-1

In my experience, a lot nullable field in a large database like you have is very normal. Considering it perhaps is used by a lot of applications written by different people. Making columns nullable is annoying but it is perhaps the best way to keep the application robust.

Victor
  • 435
  • 2
  • 8
  • 15
-1

To throw the opposite opinion out there. Every single field in a database should nullable. There is nothing more frustrating than working with a database that on every single insert throws an exception about required this or required that. Nothing should be required.

There is one exception to that, keys. Obviously all primary and foreign keys should be enforced to exist.

It should be the application's job to validate data and the database to simply store and retrieve what you give it. Having it process validation logic even as simple as null or not null makes a project way more complex to maintain for having different rules spread over everything.

Chris Marisic
  • 32,487
  • 24
  • 164
  • 258