3

My team is moving several hundred database objects from our QA server to our UAT server. All of the new objects are in two schemas that don't yet exist on the UAT database.

The first thing we do upon creating all these objects is run some very complex views to do an initial data population of our base tables (yes, the views are badly written, and it would be much more efficient to write them as stored procs, but this isn't the thrust of my question).

Most of our data load testing has been done in a database where all the objects already exist, and the performance is fine. The longest load is ten minutes.

However, when we move these new schemas/objects onto a different server and try the data load, the same code doesn't finish in 10 hours. However, if we try to run the underlying view separately, while we wait for the first exercise of the view to finish, it's very fast.

It seems obvious this is a problem with the optimizer (we've verified that the code is exactly the same between servers: we're not missing any indexes or stats). If we cancel our long-running queries and restart them, then the performance matches QA.

Here's what we've tried:

  • Rebuilding all the indexes.
  • Updating all the statistics with fullscan.
  • We've verified that SQL Server is not creating any additional statistics beyond what's automatically created by our indexes.
  • DBCC FREESYSTEMCACHE('ALL')
  • The new environment has better specs than the one we're porting from: more cores, more memory, plenty of space.

This stuff makes some difference, though it's still nowhere as quick as it is in our original environment.

What am I missing here? Beyond cleaning the optimizer, the indexes and the stats, what else could be driving the performance?

I also don't understand why the optimizer picks a terrible plan for the initial data load, but then, while that load is crawling along, it picks a great plan for the same code if I run that on another cnn.

SQL Version is SQL 2012, Svc Pack 3.

=========================

EDIT: I understand that nobody is going to be able to produce a piece of code that fixes my issue. I am curious if someone knows of an area I haven't explored yet. Examples:

  • The command I'm using to flush the cache is incomplete.
  • SQL Server keeps in memory stats that I'm not privy to (the dba I work with mentioned this, but did not provide a link, and I don't see anything online).
  • A link that explains how the optimizer generates one plan for a view select, but then provides a completely different plan on another cnn for the same view select simultaneously.
  • Some other type of metric that I need to look at that I haven't mentioned.
Quicksilver
  • 295
  • 4
  • 16
  • 4
    There are so many possibilities here that anything resembling an answer would be a shot in the dark. – Sean Lange Aug 25 '16 at 19:51
  • 1
    Okay -- then please tell me what topic I haven't covered. I don't think it is: (1) code differences (2) hardware differences (3) software differences (4) the plan cache (5) index differences (6) or missing stats. So enlighten me as to what else it could be. – Quicksilver Aug 25 '16 at 19:53
  • 2
    Did you captured the bad vs. good execution plan? – Remus Rusanu Aug 25 '16 at 19:55
  • 1
    Well who knows? That is the problem. We don't know either. It takes some analysis which means things like table definitions, index definitions, execution plans would be needed at a bare minimum. How are you loading the data? Remember that all we know about the problem is what you have posted. And it sounds like you have covered most of the typical situations we can't offer much because we have no way to analyze the situation. – Sean Lange Aug 25 '16 at 19:58
  • I'd have to agree with @SeanLange. Although you did a good job describing the problem and what you tried so far, without DDL scripts and execution plans we can't say anything. – NickyvV Aug 25 '16 at 20:09
  • I haven't tried capturing the bad plan, because the query never returns, hence no execution plan returns. I thought there was no point in looking at an Estimated Execution Plan because it could be very different. – Quicksilver Aug 25 '16 at 20:28
  • Are you running _one_ or _many_ views to populate? If you are running many, find the one that is the hold up. If you are running _one_ then optimise that. Now compare estimated plans between the two servers. The only way you will solve this is narrow down the issue. Then you can ask a specific question that someone may be able to answer. It could be something crazy like one is going parallel and one isn't.. – Nick.Mc Aug 25 '16 at 23:31
  • Check locked objects, any other running processes, compare server options, IO performance. You also can capture estimated execution plan in SSMS. So many places to look at... – Anton Aug 25 '16 at 23:42
  • Can you include your query because it could be a poor execution plan. Can you kindly provide and i'll see if i can pin it down – Dheebs Aug 26 '16 at 05:03
  • What I think you should do, is pose a question of a particular query that is going very slow. For that query, make sure that in your question you have the relevant table definitions, index definitions, view definitions, .... Also include the Actual Execution Plan. Perhaps by reviewing the execution plan you will already spot glaring errors in the way the queries/views are defined. Anyway, in your question you could reference your execution plan from [PasteThePlan](https://www.brentozar.com/pastetheplan/) (from Brentozar). That makes it easier for people around here to follow. GL! – TT. Nov 08 '16 at 18:30
  • Can you run your queries WITH RECOMPILE on? – J Brun Nov 08 '16 at 18:00

1 Answers1

1

I've seen faster hardware run slower because it can hit concurrency issues that don't crop up on the slower hardware, especially when more cores are introduced.

Check your MAXDOP settings. CXPacket waits can indicate parallelism contention. https://www.brentozar.com/archive/2013/08/what-is-the-cxpacket-wait-type-and-how-do-you-reduce-it/

Wes H
  • 4,186
  • 2
  • 13
  • 24