3

We are connecting to an SQL Server 2008 on another box using PHP 5.3 and pdo_dblib.

Periodically (10+ times a day randomly throughout the day) we experience 3 minute periods where all SQL Select queries will take 21.01 seconds to run, regardless of the query. This is timed from within PHP (before and after DB query).

The SELECT statements range from complex joins with optimization to statements with single tables and explicit indices.

If, during a PHP request, we perform three SELECT statements before closing the DB connection, the total time would be 63.03 seconds (21.01 * 3). The same applies for 2 statements (42.02 seconds).

Using SQL Server tools, we've tracked the actual execution times of these scripts to be between 0.01 to 0.45 seconds, but these differences don't seem to reflect in the overall delay (it is always fixed at 21.01, not 21.46, etc).

We did seem to find some correlation of WinHTTP Proxy triggering at the start of these delayed periods but we have since disabled WinHTTP Proxy with no resolution.

Any advice?

kevinnuut
  • 116
  • 4
  • Which SQL Server tools are you referring to? Profiler or SSMS? – PseudoToad May 23 '13 at 19:31
  • SSMS was used. All execution times were under 1 second. – kevinnuut May 23 '13 at 19:38
  • If you aren't already using them, you should consider monitoring standard perfmon counters and the server's wait stats. Those, when used with a good server side trace can usually pin the problem down. – PseudoToad May 23 '13 at 19:39
  • There are a ton of reasons that could cause this. Several are Poor Indexing, Index Fragmentation, Stale Statistics and IO contention (to name a few). Also, IO latency either from the network itself of from the amount of data requested and the way the app is using it is not uncommon. – PseudoToad May 23 '13 at 19:43
  • Even on SELECT with no joins and an explicit INDEX? These are transactions that normally take 0.01 seconds but at ten random moments in the day, for a 3 minute period, they take exactly 21.01 seconds (regardless of the SELECT). – kevinnuut May 23 '13 at 20:15
  • We are currently reviewing http://msdn.microsoft.com/en-us/library/ms175136.aspx since SQL Server 2008 defaults to 20 seconds and we've seen issues of the connection timing out then trying as another. – kevinnuut May 23 '13 at 20:16
  • Gizmo, thank you, I'm having a dev check perfom counters and wait stats. We've already checked blocking reports (no issues) and cpu times (no issues). – kevinnuut May 23 '13 at 20:25
  • How many rows does a typical SELECT return and are you certain that the index is being used? To find out, you should look at the actual (not estimated) query execution plan. If you are seeing scans instead of seeks, then the index is not being properly used. Also, how large is the table that is being queried and are you certain it is a table and not a view? – PseudoToad May 23 '13 at 20:58
  • I wouldn't recommend changing the timeout. That is like using duct tape. It might do the trick but you should find the underlying cause of the problem. – PseudoToad May 23 '13 at 20:59
  • These are SELECTs to return single rows but they are large tables. However, during these outages, the 21 second delays occur across partitions. We are reviewing network issues between the web server and database server at this time. We have reset the timeout as it had no effect. – kevinnuut May 24 '13 at 18:36

1 Answers1

0

When you look at your execution plan for these, what activity is taking the largest percentage of the resources? I'd specifically be on the lookout for Index Scans or Key/Bookmark/RID lookups. I'd also look at the lines connecting each activity and see how many records are being applied. Lastly, run the query using an Actual Query plan rather than an estimated one then look at the Actual vs Estimated rows returned for each action. They should be equal. If they are not, that's a good indication that your statistics are off.

PseudoToad
  • 1,504
  • 1
  • 16
  • 34
  • All the indexes are explicitly stated, so there is no plan execution estimator involved. We are fairly confident that it is stale connection related with the PCAP data in hand and there being no queries logged running more than 3 seconds (which were stored procedure queries). We updated the statistics and it still happened. – kevinnuut May 28 '13 at 20:27