45

I want to specify a unique index on a column, but I also need to allow NULL values (multiple records can have NULL values). When testing with PostgreSQL, I see that I can have 1 record with a NULL value, but the next will cause an issue:

irb(main):001:0> u=User.find(5)
  User Load (111.1ms)  SELECT "users".* FROM "users" WHERE "users"."id" = $1 LIMIT 1  [["id", 5]]
=> #<User id: 5, email: "a@b.com", created_at: "2013-08-28 09:55:28", updated_at: "2013-08-28 09:55:28">
irb(main):002:0> u.email=nil
=> nil
irb(main):003:0> u.save
   (1.1ms)  BEGIN
  User Exists (4.8ms)  SELECT 1 AS one FROM "users" WHERE ("users"."email" IS NULL AND "users"."id" != 5) LIMIT 1
   (1.5ms)  ROLLBACK
=> false

So even if the database allows it, Rails first checks to see if a User exists with a different id and with the email column set to NULL. Is there a way that not only the database can allow it, but Rails will not check first like above as well?

The idea is users don't have to enter an email, but if they do I need to be able to find a user by their email. I know I can create another model to associate users to emails, but I'd much rather do it the above way.

UPDATE: Here's the migration code I had created to add the email column:

class AddEmailToUsers < ActiveRecord::Migration
  def change
    add_column :users, :email, :string
    add_index :users, :email, :unique => true
  end
end

And here's the code I had added to the User model:

validates :email, uniqueness: true

I forgot that I had added the validates call to the User model. So that makes sense that Rails is checking first. I guess the only other question is if it's safe for databases to have a unique index and NULL fields? Is there a way to specify in Rails that I want to validate the email is unique unless it's nil?

at.
  • 50,922
  • 104
  • 292
  • 461
  • 1
    Why does rails check if the user exists? Is it some code in your User model or is it default behavior? Can you post the migration code? – Andreas Lyngstad Aug 28 '13 at 19:17
  • Do you have any validations? I'm guessing that the query is coming a validation. – mu is too short Aug 28 '13 at 19:27
  • You guys are correct, I updated the question. – at. Aug 28 '13 at 19:30
  • 3
    There's no problem with databases having unique indexes/constraints that allow null -- they do by default on those I know of. There are plenty of cases where this is desirable, for example for a person's email address. You might want to prevent two people specifying the same email address but allow people to not specify one. – David Aldridge Aug 28 '13 at 22:11

2 Answers2

59

Your migration will work and will allow multiple null values (for the most database engines).

But your validation for the user class should look like below.

validates :email, uniqueness: true, allow_nil: true
BananaNeil
  • 10,322
  • 7
  • 46
  • 66
aross
  • 5,620
  • 1
  • 25
  • 32
  • 1
    Hope this helps someone: `allow_blank: true` won't work. If your form is submitting a blank email, `"email"=>""`, you'll need to convert it to nil before saving to the database. Something like `params[:user][:email] = nil if params[:user][:email].blank?`. – Bazley Aug 04 '19 at 21:17
36

To clarify why this works at the database level, you have to understand the three-valued logic used in SQL: true, false, null.

null is typically taken to mean unknown, therefore its semantics in operations are usually equivalent to not knowing what that particular value is and seeing if you can still work out an answer. So for instance 1.0 * null is null but null OR true is true. In the first case, multiplication by an unknown is unknown, but in the second, the second half of the conditional makes the whole statement always true so it doesn't matter what is on the left side.

Now when it comes to indexes, the standard does not specify anything so vendors are left to interpret what unknown means. Personally, I think a unique index should be defined as in the PostgreSQL docs:

When an index is declared unique, multiple table rows with equal indexed values will not be allowed

The question should then be what is the value of null = null? The correct answer should be null. So if you read a bit between the lines of those PostgreSQL docs and say that a unique index will disallow multiple rows for which the equality operator returns true for said value then multiple null values should be allowed. This is exactly how PostgreSQL works, so in that setup you can have a unique column with multiple rows having null as a value.

On the other hand, if you wanted to interpret the definition of a unique index to be disallow multiple rows for which the inequality operator does not return false, then you would not be able to have multiple rows with null values. Who would choose to operate in this contrapositive setup? This is how Microsoft SQL Server chooses to define a unique index.

Both of these ways of defining a unique index are correct based on the 2003 SQL standard's definition of null. So it really depends on your underlying database. But that being said, I think the majority operate similar to PostgreSQL.

rokob
  • 565
  • 4
  • 5