1

I've recently been chasing an issue with a client's db... solution found, but impossible to recreate.

Essentially, we're doing a

Select * from mytable where ArbitraryColumn = 75

Where MyTable has an Identity column, called 'MyIndentityColumn' - incremented by one in each insert. Naturally, and normally I would assume that the order returned would be the order in which they are inserted (bad assumption, but one which was forced onto me, through an inherited application - which has been patched).

Essentially, I would like suggestions as to why the database, when restored to my local machine (same OS, same SQL server version - 200 sp3) same collation, and same backup instance restored on it, as a test DB on the client site.

When I perform the above select, I get them in order of insert (i.e. identity column ordered ascending). On the client, it seems random (but the same 'random' order each time)...

A few other points:

  1. I have the same collation on my test server as client
  2. Same DB backup restored to a test only I can access
  3. Same SQL server version and service pack
  4. Same OS
  5. Test DB is a new DB - new log and MDF...

I have the problem 'solved' by adding an explicit order by clause but I want to undertand the cause of the issue, given the exact nature of my attempts to recreate it beuing futile, and perfectly recreatable on the client server...

Thanks in advance,

Dave

Dave
  • 281
  • 1
  • 3
  • 11
  • I don't have an answer, but check out this other ServerFault question that discusses backup/restore operations at the physical level. Basically, the backup/restore doesn't change the order your data or indexes are laid out on disk, so something else has to be going on. http://serverfault.com/questions/33432/does-restoring-a-sql-database-from-backup-rebuild-its-indexes – MattB May 21 '10 at 15:29
  • belongs on StackOverflow because it's a coding question – gbn May 21 '10 at 18:22
  • 1
    Without an `ORDER BY` clause there is **no default order**. Any order is valid. The records may or may not be stored in a particular order within the database. The database may even be reorganized to a different order during maintenance. – Nate May 21 '10 at 18:32

3 Answers3

2

The order in which the records are returned depends on your indexing. Three things come into play:

  • what is the index being used in the query and is your identity column the main part of the index.
  • How fragmented the index being used is. If it is highly fragmented, then the records will most certainly be returned out of order.
  • Is the index you are using the clustered index on that table

The only way to guarantee the order of records returned is to use an ORDER BY statement. If you are depending on the specific order of the returned recordset, then you must use an ORDER BY statement.

Scott Lundberg
  • 2,364
  • 2
  • 14
  • 22
  • This is not true. How SQL Server reads in the data as part of the plan determines the "random" order. The engine does not account for index fragmentation – gbn May 21 '10 at 18:16
  • I don't agree. How it finds the data depends directly on the clustered index. If it can't find the data, then it can't return it in a select statement. The plan is related because of the way it joins to other tables and uses those indexes in the query, but if you did SELECT * FROM Table1, the plan consists of one entry: scan Table1. That scan will follow the clustered index. Here is one article explaining my position. http://www.sqlservercentral.com/articles/Indexing/sqlserver2000indexing/1479/ – Scott Lundberg May 22 '10 at 03:14
  • And if it's a parallel plan? – gbn May 22 '10 at 04:59
  • The OP was talking about a single table. That is and should be the context of our discussion. But to answer your question, I don't see why a parallel plan is any different... It's still going to do a table scan based on the best index or the index specified by the query author. Do you see it differently? Also, if you subscribe to SQL Server Central's newletter, there is very appropriate article titled "Clustered Indexes? Sedimentary, my Dear Watson". They haven't posted it to the site yet, but it will link it when they do. – Scott Lundberg May 22 '10 at 15:16
  • Another link with the same discussion: http://www.rhinocerus.net/forum/soft-sys-sas/482437-re-order-records-returned-selects-without-order-bys.html "it's going to return them in the order that the rows appear in the database, not the index. You will also get inconsistent returned data even with a clustered index-- unless you have performed a reorg of the database immediately before running your pull, you will have rows out of order due to the way that they're physically stored in the data (when you insert a new row, it goes at the end of the cluster closest to where it appears in the index)." – Scott Lundberg May 22 '10 at 15:23
  • Here is the sqlserver link: http://www.sqlservercentral.com/articles/Editorial/70320/ – Scott Lundberg May 23 '10 at 03:11
  • As in my post, I have already added an ORDER BY to resolve this. I was hoping for an intelligent discussion on the intricacies of the select statement - I also stated it was the SAME DB, PATCH VERION, and so on. I didn't explicitly say that the index on the table was the same - but they are the same backup so it is implied. – Dave May 23 '10 at 22:40
  • @Scott Lundberg: if you read my link by Conor Cunningham he states the engine does not account for index fragmentation. He works/worked on the "SQL Server Query Optimization Team". Also read this by the team who wrote the optimiser: http://blogs.msdn.com/queryoptteam/archive/2006/05/02/588731.aspx. Basically, no assumptions can be made no matter: the folk who wrote this bit of SQL Server say so. – gbn May 24 '10 at 04:47
  • @gbn: I have not and am not saying the engine takes it into account, in fact the reason that a fragmented index causes records to be returned out of order is because does not take fragmentation into account. It just returns the records as it finds them, out of order or not. So in conclusion, a fragmented index will most certainly cause the OP's issue of non-ordered records. – Scott Lundberg May 24 '10 at 13:14
  • @Dave: I believe the reason your test machine returns records in insert order, is because during the restore, your data is reinserted in clustered index order. You can verify by deleting and inserting some new records on your test machine, the records will start being returned out of order. – Scott Lundberg May 24 '10 at 13:24
  • @Scott: I'll give that a shot and see how it goes. The interesting thing imo, is that when I backup the database which returns with the 'wrong' order I can restore it to my test env, and to a new test DB on the client machine. On my test VM, it returns in order of insertion, but at client site, it returns an order true to the original DB... I'll try your suggestion and get back to you though, thanks for the information. – Dave May 25 '10 at 08:29
  • @Dave: Are you saying that the original client still returns out of order records when using a local (on the original client) DB only? If so, that's a head scratcher. I would be curious what you find out there. What happens when the client queries your test VM? Is this issue client specific for some reason? – Scott Lundberg May 25 '10 at 16:05
2

Ordering of a record set is determined only by the outermost ORDER BY.

For avoidance of doubt: there is no default sort order

The "reproducible" order you mentioned have only reflects the how SQL Server reads the data in the execution plan for your server at that time. Same service pack? Same edition? Same number of CPUs (including HT)? Exactly the same SET options?

If the plan changes, the order will probably change too. It had nothing to do with insert order or index order or disk location order.

This question belongs on StackOverflow and has been answered before:

From MSDN, Sorting Rows with ORDER BY

ORDER BY guarantees a sorted result only for the outermost SELECT statement of a query. For example, consider the following view definition

From Conor's blog ,

Does the SQL Server Query Optimizer consider index fragmentation in plan selection?

No, it does not directly care.

gbn
  • 6,079
  • 1
  • 18
  • 21
  • Further to your comment on my response... I think we are saying the same thing. The plan doesn't care about index fragmentation, so therefore it returns records as it finds them and in doing so follows the fragmentation. So if you insert a bunch of rows in a table and then read them out using an identity column that is your clustered index, then they will come out in the same order. If you insert/delete/change things over time, the index will become fragmented and the returned dataset follows that fragmentation. – Scott Lundberg May 22 '10 at 03:19
  • I know there is no default sort order, and I've already said I used order by. I know nothing should be assumed. It is 'reproducible'. Their must be a difference that causes this behaviour, and this is what I'm trying to understand. I've stated in the post same service pack, and the edition is also correct. I also know how an ORDER BY works, and as I've said in my post. What I'm trying to pin down, is why the same db instance, same service pack, same edition, same collation's etc give different behaviour. – Dave May 23 '10 at 22:51
1

After some extensive investigation, we found the route problem...

The client had an additional index on a table, which was throwing everythign out of sync, but only on their server. When tested locally, either a backup of the LDF/MDF or a restore onto a fresh database - the problem wasn't exhibited.

On the client machine, on the original mdf/mdf, after being backed up and restored, or on a fresh database which was restored onto, the issue is exhibited 100% reliably.

As proof that the index was the cause, we could remove and re-add the index and see the unwanted behaviour happening and then not happening.

Dave
  • 281
  • 1
  • 3
  • 11