0

I'm working on a product where I need to support both SQL Server and SQL Server CE (4.0).

I have two tables where one is an entity and the other is relation-table, something like this:

Customer
id
name
Order
id
customerId
created

I need to create a list of all customers together with the total number of orders.

Since there are limitations in SQL Server CE, for example one can't use sub queries in the select-statement I figured I could query like this:

SELECT [id]
      ,[name]
      ,ordersCount.total as totalOrders
  FROM [customer] as c
  LEFT JOIN(
        SELECT customerId, Count(o.id) as total FROM orders AS o GROUP BY customerId
  ) as orderCount ON c.id = orderCount.customerId
ORDER BY id
OFFSET 0 ROWS
FETCH NEXT 10 ROWS ONLY

This query works in both SQL Server CE and SQL Server and it's quite fast if the number of rows are low, but when we're having lot's or rows the query is getting really slow on SQL Server CE, even when using "FETCH NEXT 10 ROWS ONLY" to only fetch 10 rows. I guess that the LEFT-join is performed on all the rows even when I just fetch 10.

Executing the "parts" of the query takes less than 100ms,

SELECT customerId, Count(o.id) as total FROM orders AS o WHERE GROUP BY customerId

Takes 50ms on SQL Server CE

SELECT [id]
      ,[name]
      ,ordersCount.total as totalOrders
  FROM [customer] as c

Takes 80ms on SQL Server CE

But when I execute the whole query with the LEFT JOIN the query takes over 2 minutes (about 14 000 customers with 17-18 000 orders.

The execution plan indicates that "Nested Loops" takes out 100% of the time for the processing in SQL Server CE. But I can't seem to figure out why?

enter image description here

I have indexes on:

  • customer.id (PK. Includes: customer.id)
  • order.customerId (Included order.customerId)

I'm wondering if there is some way to make the query faster on SQL Server CE? And also, could I query in another way and get the same result but more efficient?

EDIT: I tried to divide the query into two queries, first getting the 50 customers that are about to be shown, then querying for the count.

SELECT [id]
      ,[name]
  FROM [customer] as c
ORDER BY id
OFFSET 0 ROWS
FETCH NEXT 50 ROWS ONLY

Then taking the ids from these 50 and query for the count:

SELECT customerId, Count(o.id) as total 
FROM orders AS o 
WHERE customerID IN ([ids from query above])
GROUP BY customerId

This executes in around 200ms.... it's sooo strange.

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
  • Please edit into your post what your indexes are, and share the query plan via https://PasteThePlan.com – Charlieface Jan 10 '21 at 01:55
  • @Charlieface I've already included the indexes (custom.id and order.customerId), tried to paste the plan in the link you provided but it said that the plan was invalid XML, but I can open it in SSMS. – Markus Knappen Johansson Jan 10 '21 at 02:01
  • Full index definitions, with `INCLUDE`s. Are you copying all of the XML, you may have missed a tag? Also the query plan on SQL Server so we can compare. At a guess, without seeing the full plans, a Merge Join would have been better, but for some reason wasn't chosen – Charlieface Jan 10 '21 at 02:08
  • You have either a typo or deleted text. What does `WHERE GROUP BY` mean – Charlieface Jan 10 '21 at 02:09
  • I just tried to switched "LEFT JOIN" to "INNER JOIN" which executed in less than 100 ms. But sometimes a customer can exist without an order. I feels so strange that it can such a difference between left join and inner join? I've updated to the full index-descriptions. – Markus Knappen Johansson Jan 10 '21 at 02:10
  • I wouldn't have expected a major difference between those two syntaxes. What is the difference between the query plans of those? – Charlieface Jan 10 '21 at 02:13
  • On SQL Server there is no difference but with SQLCE it 100ms for INNER JOIN vs. 2 minutes for LEFT JOIN. It looks from the plans like SQL Server is "narrowing" down the join-rows to only the rows that will be returned while SQLCE joins all rows and then returns 10. – Markus Knappen Johansson Jan 10 '21 at 02:18
  • But would it be possible to query in another way and get the same result? – Markus Knappen Johansson Jan 10 '21 at 02:19
  • Please share the full query plans and the **FULL** SQL for the queries. – Charlieface Jan 10 '21 at 02:59
  • >> Then taking the ids from these 50 and query for the count: - what is not working with that solution? – ErikEJ Jan 10 '21 at 08:12
  • Databases in CE edition have some special configurations like AUTOCLOSE that can be the cause of your trouble. – SQLpro Jan 10 '21 at 10:47
  • @SQLpro thanks for posting! Could you elaborate a little on this? What does that mean and how can I test to adjust these settings? – Markus Knappen Johansson Jan 10 '21 at 11:03
  • Verify with: SELECT name, is_auto_close_ON FROM sys.databases;. Correct your database with ALTER DATABASE MyDB SET SET AUTO_CLOSE OFF; – SQLpro Jan 11 '21 at 13:06

2 Answers2

1

First of all, thanks a lot to @ErikEJ who was kind enough to help me out privately. Such a nice and helping person!

ErikEJ pointed out one thing that I did not know that SQL Server CE only uses one index per table and question which might be good to know for the future.

After fiddling around and reading up on this I came up with the idea to try a "CROSS APPLY"-query.

Something like this:

SELECT 
    [id], [name],
    ot.total AS totalOrders
FROM 
    [customer] AS c
CROSS APPLY 
    (SELECT COUNT(o.id) AS total 
     FROM orders AS o 
     WHERE o.customerId = c.id) AS ot
ORDER BY 
    id
    OFFSET 0 ROWS FETCH NEXT 10 ROWS ONLY

This query executes in about 1 ms on SQL Server and somewhere around 300 ms in SQL Server CE. In terms of performance, in this case, it's still around 225% faster to fetch the first 50 rows and based on them fetch the COUNT() with a group query. But if one needs to sort on the count (which id need) the CROSS APPLY-query seems to be the best option.

Thanks to everyone who took the time to help me!

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
0

Split in to two queries, one to get the customer ids, and one to get the related orders and make sure you have good indexes for both queries.

If you a fetching all customers, consider fetching the count for all orders, to avoid a large IN condition.

Happy to help you investigate if you share database and queries privately.

ErikEJ
  • 40,951
  • 5
  • 75
  • 115
  • The "take the 50 ids"-solution works really good for this use case since the sort it not based on the count, but if it would have been I would still have needed the sort inside the query. I could sharee the SQLCE file and the query in some way privatly, don't want to post it all here - should I reach out of twitter or what do you prefer? – Markus Knappen Johansson Jan 10 '21 at 10:08
  • Email me, address in on Github – ErikEJ Jan 10 '21 at 14:07
  • can't seen to find your email on your github page, not on your blog either, and you have PM disabled on Twitter... =/ – Markus Knappen Johansson Jan 14 '21 at 16:51
  • Ohh, manage to find something on your old blog =D Hope that this one is the right one: ej*****@hotmail.com, I've dropped you an e mail. – Markus Knappen Johansson Jan 14 '21 at 16:57