REST APIs: Hiding surrogate keys and only exposing natural keys to the user

Question

Suppose we're working with a database that has the following relation:

CREATE TABLE Author (
  first_name VARCHAR NOT NULL,
  last_name VARCHAR NOT NULL,
  birth_date DATE NOT NULL,
  death_date DATE,
  biography TEXT,
  UNIQUE(first_name, last_name, birth_date)
);

In the real world, it's highly, highly improbable that two authors with the same first and last name will have been born on the exact same day. So we consider the combination of an author's first name, last name, and birth date to be a natural key.

However, for the purposes of joining tables and creating foreign keys, this is a bit painful because it means we need to store these three pieces of information over and over and over in our tables. Maybe we have a BookAuthor relation associating Authors with Books.

So we create a serial ID (or a UUID if we wanted to) and treat it as the primary key:

CREATE TABLE Author (
  id SERIAL PRIMARY KEY,
  first_name VARCHAR NOT NULL,
  last_name VARCHAR NOT NULL,
  birth_date DATE NOT NULL,
  death_date DATE,
  biography TEXT,
  UNIQUE(first_name, last_name, birth_date)
);

However, based on the reading I've done here on StackOverflow and other sites, it seems you never want to expose your primary key to the end user because doing so may allow them to ascertain information that they should not be able to. If this were a situation involving user accounts, for example, and we decided to use serial IDs (we shouldn't, but humor me), then someone could potentially figure out how many total users we have in our database, which isn't information that they should see. In this particular case, someone knowing how many total authors we have isn't really a big deal. But I want to follow best practices.

So, if I were to design a REST API for this database, would it be acceptable to use the natural key to uniquely identify resources (Authors) instead of using serial IDs? For example:

https://www.foo.bar/authors/?first=:Terry&last=Pratchett&dob=19480424

My reasoning here is that there's no way for an end user to know what serial ID corresponds to this author in order to query for them.

The main problem with exposing a serial id is usually that a potential attacker might be able to guess the identifiers of other users/produces/whatever. The easy way to handle this is to expose a seemingly random string as your identifier. — Zohar Peled, Jun 01 '20 at 15:13

score 3 · Answer 1 · answered Jun 09 '20 at 23:45

As long as you guarantee this combination of data elements is an unique alternate key (would be good having a database unique key to enforce that) and use it consistently across all the API methods it would be perfectly acceptable to use it to identify resources in a REST API. There is no conceptual flaw in that.

The only minor issue is if you change the natural key, updating name or date of birth, cached data would be invalidated. Not a big deal. GET methods would still be idempotent.

If you choose to use a natural key as identifier and the data elements of this key are editable, keep in mind that you may want to do a redirect to a new URL when you PUT updates and want to keep displaying the same resource.

score 1 · Answer 2 · answered Jun 09 '20 at 19:28

To address the different concerns you are looking at, I would suggest the following:

Generate a sequential id (a numeric sequence), if possible, which will be your primary key. This key is for internal app use (or db internal) and never leaves the backend.
A unique (non-numeric) id (UUID?... or a hash from id+first_name+las_name+b_day) which will be used as the key for access your data through the API calls.
Optionally, you could still have the unique key: first_name+last_name+b_day.

This way you will have a simple sequential Id for your DB, but also an Id that does not expose relevant information about your system.

This may not be perfect, or complete, solution, but could be a good start.

score 0 · Answer 3 · answered Jun 03 '20 at 22:22

So, if I were to design a REST API for this database, would it be acceptable to use the natural key to uniquely identify resources (Authors) instead of using serial IDs?

The short answer to your question is No.

The longer answer is that there are actually two separate concerns here; you mention:

There's no way for an end user to know what serial ID corresponds to this author in order to query for them.

What this implies is that you want the end user to be able to search for an author.

This is different from the standard use-case for a GET route which explicitly requires that the requester knows the unique identifier (read this W3 description of the basic REST methods for more information).

I would recommend that you have two separate APIs, one for retrieving all details of a resource given its UUID, and a second for searching by some fields.

REST APIs: Hiding surrogate keys and only exposing natural keys to the user

3 Answers3