4

First, I know this is a rather subjective question but I need some kind of formal documentation to help me educate my client.

Background - a large enterprise application with hundreds of tables and SP's, all neatly designed with normalized tables and foreign keys using identity columns.

Our client has a few employees writing complex reports in Crystal enterprise using a replicated copy of our production Db.

We have tables that store what I would classify as 'system' base information, such as a list of office locations, or departments within the company, standard set of roles for users, statuses of other objects (open/closed etc), basically data that doesn't change often.

The issue - the report designers and financial analysts are writing queries with hardcoded identity values inside of them. Something like this

SELECT xxx FROM OFFICE WHERE OFFICE_ID = 6

I'm greatly simplifying here, but basically they're using these hard coded int values inside their procedures all over the place.

For SQL developers seeing this will obviously make you facepalm as it's just a built-in instinct not to do this.

However, surprisingly I can't find any documentation or even best practices articles as to why this shouldn't be done.

They would argue it's fine to do this since the values never change, and they're right, within that single system those values won't change, however across multiple environments (staging/QA/Dev) those values can and are absolutely different, making their reporting design approach non-portable and only able to function in 1 isolated server environment.

Do any of the SQL guru's out there have any more in-depth information/articles etc that I can use to help educate my client on why they should avoid this approach?

n4esa
  • 145
  • 10

1 Answers1

5

Seems to me the strongest argument to your report writers is your second to last sentence "...those values can and are absolutely different [between environments]". That would be pretty much the gist of my response to them.

Of course there's always gray area to any question. Identity columns are essentially magic numbers. They have the benefit to the database of being...

  • Small
  • Sequential
  • Fast to seek and join on, sort by and create

...but have the downside of being of completely meaningless, and in effect, randomly assigned (sort the inserts into that table one way, you get a different identity per row than if you sorted the other way). As such, in cases where you have to look up something specific like that, it's common use also include a "business/natural/alternate" key (e.g. maybe (a completely made up example) [CategoryName] where CatgoryName is something short, unique and human readable, while. [CategoryId] is an identity, but not something intended to be sought on)

If you have a website with, say, a dropdown menu, usually the natural key gets put into the visible part of the drop down, and the surrogate/identity key gets passed around on the back end, invisible to the end user.

This gets a little trickier when you have people writing queries directly against the database. If they're owners of the data, they may know things about the larger data structure which they can take advantage of in *cough "clever" ways. If you know the keys wont change and you know what those values are, there might be a case to be made just referencing those. But again, not if they're going to be different when you query a different server.

Of course the flip side is, if you don't want them to use the identity values, you'll have to give them an alternative. And if your tables don't already include a business/natural/alternate key, you're going to have to add one wherever one doesn't already exist.

Also, there's nothing wrong with that alternate key being an integer too (maybe you already have company-wide identifiers for your offices of 1, 2, 3 etc), but the point is that it's deterministic no matter where you run your query.

Xedni
  • 3,662
  • 2
  • 16
  • 27
  • Good answer, I agree with all of that, and we do in fact have exactly what you're describing already ie. with the matching pair Id/Name columns. Trying to have them understand that 'SELECT Id from Category WHERE Name='xxx'' into a variable and using that instead, is something they just don't understand therefore reject, so I was hoping to have a solid reference I can direct them to from a large authority so I can say, don't do it because this: reference. Fire up a new Azure VM and seed different table values and their reporting falls apart whereas our app won't, hard to get that point across – n4esa Oct 17 '17 at 22:06
  • I see what you mean about a dearth of authoritative articles on the subject. It's not *exactly* your question, but there's some good responses in this post as well: https://stackoverflow.com/questions/2001375/sql-avoiding-hard-coding-or-magic-numbers – Xedni Oct 17 '17 at 22:21