1

Below question arose in my head when refactoring complex stored procedure that was a bottleneck in my database... Let me introduce to the topic. Suppose we have lookup/dictionary table like (it contains GUID as Foreign Key to other tables and name which is human readable):

CREATE TABLE [dbo].[PlayerStatus](
    [PlayerStatusId] [uniqueidentifier] NOT NULL,
    [PlayerStatusName] [nvarchar](50) NOT NULL,
 CONSTRAINT [PK_PlayerStatus] PRIMARY KEY CLUSTERED 
(
    [PlayerStatusId] ASC
))

and there's a Player table :

    CREATE TABLE [dbo].[Player](
[PlayerId] [uniqueidentifier] NOT NULL,
[PlayerStatusId] [uniqueidentifier] NOT NULL,
[PlayerName] [nchar](10) NOT NULL, 
[PlayerSurname] [nchar](10) NOT NULL, 
CONSTRAINT [PK_Player] PRIMARY KEY CLUSTERED  ( [PlayerId] ASC )) ON [PRIMARY]

Pretty sraightforward.

Let's say that somewhere in the code there's huge query that accesses a great deal of tables:

SELECT ...
FROM Player JOIN PlayerStatus ON Player.PlayerStatusId = PlayerStatus.PlayerStatusId
.....
WHERE PlayerStatus.PlayerStatusName = 'Active' ....

Now, in my stored procedure, according to execution plan Players table was included in resultset at the beginning. Assuing that this is very large table with milions of rows, hash join between Player and PlayerStatus could be time consuming. By optimizing this query I could rewrite it to someting like this

SELECT ...
FROM Player .....
WHERE PlayerStatus.PlayerStatusId = '46bb6a12-4cd9-4b6c-84c2-7444f5f45eb6' ....

And that's exactly what I did in my bottleneck procedure. That way, I removed 4 similar lookup/dictionary tables that contained different type of statuses. To my suprise, I managed to improve performance by 50%, even though I thought that such tables won't impact performance at all. But that's side plot. My question is: what do you thing of hardcoding guids?

EDIT

  • I do have PK indexes on PlayerStatus.PlayerStatusId and Player.PlayerId
  • I do have FK constraint on Player.PlayerStatusId:

ALTER TABLE [dbo].[Player] WITH CHECK ADD CONSTRAINT [FK_Player_PlayerStatus] FOREIGN KEY([PlayerStatusId]) REFERENCES [dbo].[PlayerStatus] ([PlayerStatusId])

  • Player table constains about 2mln records, PlayerStatus table contains about 25 records
dragonfly
  • 17,407
  • 30
  • 110
  • 219
  • I do have FK constraint but didn't include it in code listing – dragonfly Apr 05 '11 at 18:53
  • I do have FK constraint on Player.PlayerStatusId but didn't include it in code listing. Also I have all relevant indexes + Tuning advisor says everything is all right with query. Thanks for tips on optimalization, but basically that's not the point. The thing I'm interested in, is what does community think of hardcoding GUID in queries to avoid joins. – dragonfly Apr 05 '11 at 18:59
  • One more question: How many rows (approximately) do these two tables have? I would also like to see the whole query and not a small part. Do you have more tables joined or just those two? – ypercubeᵀᴹ Apr 05 '11 at 19:00
  • 1
    @dragonfly Don't hardcode a GUID to avoid a join unless you want to inflict maintenance pain on yourself. It's a premature optimization, while the others are just basic best practices. – Cade Roux Apr 05 '11 at 19:02
  • 1
    What I meant to ask was: If we have 10 users which are "Active", do all 10 have the same GUID in their `PlayerStatusId` field? If yes, why you used such a big sized field for a primary key in Status tablε? I would guess one would not have more than 10 or 20 possible statuses. – ypercubeᵀᴹ Apr 05 '11 at 19:06
  • @Cade Roux - I imagine that having hardcoded GUID is more painful then having nice & neat string. But as I said in question: I removed PlayerStatus table from join and performed filtering on Player.PlayerStatusId instead of PlayerStatus.StatusName just for fun, and it turned out to speed up whole query. I was bewildered. – dragonfly Apr 05 '11 at 19:19
  • @dragonfly I'm not a fan of the string either - what about user alteration or localization of that string? Things which are logical need to have columns (or other constructs) which represent the immutable logical state, whether a bit, code or something. – Cade Roux Apr 05 '11 at 19:27
  • @dragonfly I'd like to notice unacceptable slowness then see the actual execution plans and the actual performance difference (not just relative difference) before I worry too much about optimization. – Cade Roux Apr 05 '11 at 19:28
  • @Cade Roux and others: discussion is getting a little out of topic. Of course I use strings in PlayerStatus table not as UI string. It's then replaced with localized version when displayed on Web Form. What I want to say is: how about we focus strictly on my question, leave localization and other gizmos aside :) – dragonfly Apr 05 '11 at 19:33
  • @dragonfly - Answer - Don't hardcode the GUID. That leads to where is the criteria? In the status table. How to specify it? If string is immutable, fine, use that if you want, I prefer a logical IsActive flag. If performance is unacceptable, revisit. – Cade Roux Apr 05 '11 at 19:47
  • @Cade Roux - what's with the IsActive flag? I want to get Players with ONE particular status, it could be Active/Retired/Free Agent/ etc and I have 25 statuses now, and we add new to PlayerStatus table when needed. Apart from this, it's just a sample. String is immutable (by design, not by constraint) - on some point of app life cycle status can be renamed from, let's say: Retired -> Old Prick. However, as you can see, my database schema is designed in a way where Player table contains FK PlayerStatusId , not StatusName. I can't make WHERE clause look like this: Player.StatusName = 'Active' – dragonfly Apr 05 '11 at 20:00

3 Answers3

1

Answer - Don't hardcode the GUID. That leads to where is the criteria? In the status table. How to specify it? If string is immutable, fine, use that if you want, I prefer a logical IsActive flag. If performance is unacceptable, revisit - using info we discussed

Do you have a foreign key constraint?

If you are doing an inner join and DO NOT have a foreign key constraint, each row has to be matched (to satisfy the logic of the inner join) regardless of whether the column is consumed.

If you have a foreign key constraint to a unique column (a PK, obviously), the optimizer knows that there can be one and only one and it can eliminate the need to match because it knows it will be satisified.

Constraints are your friend.

As the other answer indicates, you also need an index on your status foreign key, and I would also review the execution plan to see what exactly is going on.

As far as hardcoding the GUID, it's unusual, since GUIDs are generally quite anonymous.

Also, I typically would have a logical column, like IsActive in the status, since you might have several "statuses" which are equivalent logically in certain circumstances, like Status IN ('Closed', 'Locked', 'Suspended', '') => IsInactive = 1, while only ('Locked') => IsLocked = 1. FWIW, I tend not to use a single status string, but to use physical flags for individual states on accounts and then logical combinations of these as logical flags for query criteria.

I re-read what you posted and as far as your execution plan, this is going to change depending on the statistics in the table. I double very much that the plan would be the same for 100 Players as it would be for 1 Million players - definitely check that before you try to do any premature optimization. Also, in testing, be sure statistics are updated - sometimes a plan which is good for a million rows will freak out for one row.

Cade Roux
  • 88,164
  • 40
  • 182
  • 265
  • Hi, I do have FK constraint but didn't include it in code listing. Sorry – dragonfly Apr 05 '11 at 18:52
  • @dragonfly see my updates. Avoid premature optimization. A 50% improvement on an insignificant operation is insignificant. Check the execution plan for several configurations (same schema, indexes etc, just bigger data volumes), just ensure that the statistics are up to date and your plan cache is cleared. – Cade Roux Apr 05 '11 at 19:00
  • 50% on whole query, not operation. Average time for stored procedure query was 20seconds, after changes it plumeted to 10seconds. I agree with you and I realize that it's just the beginning, it should be 1second, but as it created improvement and it made me interested, I created that topic :) And for the record, I'm using the biggest data volumne I have : live enviroment :) – dragonfly Apr 05 '11 at 20:16
  • What do you mean by that: "...If you are doing an inner join and DO NOT have a foreign key constraint, each row has to be matched (to satisfy the logic of the inner join) regardless of whether the column is consumed..." , especially last sentence? Thanks – dragonfly Apr 05 '11 at 20:41
  • @dragonfly OK, I agree your performance is not good if you aren't down to subsecond yet. – Cade Roux Apr 05 '11 at 20:52
  • @dragonfly Because you have an inner join, logically, the database MUST ensure that rows exist on both sides of a join. If a declared foreign key constraint to a unique key already exists, and no columns from the right hand side are actually requested in the query (other than the join) that's sufficient and it's done. If a constraint doesn't exist, then the plan has to include some kind of join/lookup which ensures that a row exists on the right hand side. This is regardless of whether an attribute of the right hand table is consumed. – Cade Roux Apr 05 '11 at 20:53
1

Please keep in mind, clustered indexes are not suggested on GUID columns. Convert your clustered indexes to regular PK indexes and run your queries again. You might notice the difference.

mevdiven
  • 1,902
  • 2
  • 17
  • 33
  • Do you have a reference for that suggestion? If I issue such a thing to my supervisor I might get fired :) All out tables in database have Clustered indexes on GUID PK columns – dragonfly Apr 05 '11 at 19:21
  • 1
    Clustered index needs to be narrow, static, increasing (http://www.sqlskills.com/BLOGS/KIMBERLY/post/GUIDs-as-PRIMARY-KEYs-andor-the-clustering-key.aspx) - if you use a GUID, it should be a modified COMB Guid or NEWSEQUENTIALID http://stackoverflow.com/questions/362232/is-guid-the-best-identity-datatype-for-databases/362371#362371 - particularly: http://www.informit.com/articles/article.aspx?p=25862 – Cade Roux Apr 05 '11 at 19:52
  • A reference: http://serverfault.com/questions/37251/clustered-indexes-versus-non-clustered-indexes – ypercubeᵀᴹ Apr 05 '11 at 19:59
  • But clustered indexes are good, especially for big tables. So, the best choice maybe not to remove the cluster but to change GUID PKs to a sequential PK, like @Cade: suggested. – ypercubeᵀᴹ Apr 05 '11 at 20:01
  • Clustered index on GUID columns will defragment your indexes like crazy. Agree on Cade Roux, GUIDs for clustered indexes should be retrieved from NEWSEQUENTIALID() function if you have to have clustered index. You can convert your clustered index to regular ones, easy... You won't get benefit of C/I but you'll be OK performance wise. – mevdiven Apr 05 '11 at 20:04
1

Hard coding GUIDs or other numeric IDs may not look very elegant, but from my experience sometimes it proved to be quite beneficial as far as performance is concerned.

Your example is quite simple, but if you had a more complex query with many joins, removing one join could speed up the query. An example in your code would be removing join with PlayerStatus and filter using PlayerStatusID from Player table instead of using PlayerStatusName from PlayerStatus.


There are two more thing to consider when it comes to hard coding GUIDs/IDs:

  1. GUID/ID is usually a PK in a table and referenced by FK, so it's harder to change GUID/ID than it is to change i.e. status name. Changing PlayerStatusName in your example from 'Active' to 'In action' would make your query using PlayerStatusName useless. So using GUIDs/IDs guarantees that queries are based on solid columns (PKs, FKs)
  2. Using GUIDs/IDs in queries requires little discipline if you have multiple environments for example. You need to make sure relevant IDs in tables holding dictionaries (i.e. PlayerStatus) are the same across all database instances.
mike
  • 53
  • 1
  • 5