2

I am currently fiddeling around with the AdventureWorks sample database and LinqPad, for scratching out some ideas.

This is the query in question:

SalesOrderHeaders.GroupBy (soh => new {soh.CustomerID, soh.BillToAddressID})
                 .Where(soh => soh.Skip(1).Any())
                 .Dump();

The idea was to find duplicates based on some criteria and then display them except the first set of data. The result should be deleted from the table.

After executing the query I get result A) First Result

After executing the query again I get Result B) enter image description here

I do not care about the correct results of the query, but about the ordering of the resultset. Only those two possibilities exist and they alternate on every run of the query. Surely I could just order by Key, but I am more interested in why does this even occur? Why is the order chaning/alternating?

Marco
  • 22,856
  • 9
  • 75
  • 124
  • 1
    Your expectation leads to the general principle that data presented to users should be ordered even if there is no other reason to order it nor any logical ordering between rows. (Or course, this doesn't apply to developer tools like LINQPad and SSMS.) – Tom Blodget Sep 19 '14 at 18:52

2 Answers2

7

Sql server select query's result set order is not deterministic. It's just how sql server works and it's not a bug in neither linq or linqpad. The only way to get deterministic results on your query, as yourself noted, is to use an OrderBy clause.

Edit: About getting same results in SSMS if you run the query multiple times, see this. This post explains why you might get the same results if you execute a query multiple times and why you shouldn't rely on it.

Community
  • 1
  • 1
brz
  • 5,926
  • 1
  • 18
  • 18
  • There's another workaround I just found. Take the generated SQL code and execute it in SSMS. Then the order is set. But why are there excactly 2 different results and not 3, 4 or 5? – Marco Sep 19 '14 at 09:09
  • I believe SSMS does internal ordering of data using some unique row id / data id, as it fetches it, that's why it is able to give same order of rows for every select statement run multiple times. For data fetched through provider in the programming paradigm, there's no such internal ordering. Also the number of orders would depend on number of rows fetched by you, if you have good number of records you would see higher order uncertainty. – Mrinal Kamboj Sep 19 '14 at 13:10
1

Ordering is never deterministic as suggested earlier, but try and insert orderby clause in either your sql query or the Linq query, that's the only way to make it deterministic.

In fact let's look little deeper in the database for the reason. DB would be fetching all the data from the disk via i/o. data is stored in the internal sql server structures like pages, extents, segments (these are Oracle data blocks, I hope sql server have the similar stuff). Now when the query is fired, database would know all the different locations to fetch the data from, but this is not a serial operation, it is sort of parallel fetch, where different datasets are then combined to provide a user view. Now as we know in case of threads , it can never be deterministic that who goes first and who returns first, that's totally OS scheduling of the threads, hope that clarifies further little more.

OrderBy clause will work on fetched data to make in a particular order, so will always yield deterministic result.

Mrinal Kamboj
  • 11,300
  • 5
  • 40
  • 74