8

Have any of you experienced the following, and have you found a solution:

A large part of our website's back-end is MS SQL Server 2005. Every week or two weeks the site begins running slower - and I see queries taking longer and longer to complete in SQL. I have a query that I like to use:

USE master
select text,wait_time,blocking_session_id AS "Block",
percent_complete, * from sys.dm_exec_requests 
CROSS APPLY sys.dm_exec_sql_text(sql_handle)  AS s2 order by start_time asc

Which is fairly useful... it gives a snapshot of everything that's running right at that moment against your SQL server. What's nice is that even if your CPU is pegged at 100% for some reason and Activity Monitor is refusing to load (I'm sure some of you have been there) this query still returns and you can see what query is killing your DB.

When I run this, or Activity Monitor during the times that SQL has begun to slow down I don't see any specific queries causing the issue - they are ALL running slower across the board. If I restart the MS SQL Service then everything is fine, it speeds right up - for a week or two until it happens again.

Nothing that I can think of has changed, but this just started a few months ago... Ideas?

--Added

Please note that when this database slowdown happens it doesn't matter if we are getting 100K page views an hour (busier time of day) or 10K page views an hour (slow time) the queries all take a longer time to complete than normal. The server isn't really under stress - the CPU isn't high, the disk usage doesn't seem to be out of control... it feels like index fragmentation or something of the sort but that doesn't seem to be the case.

As far as pasting results of the query I pasted above I really can't do that. The Query above lists the login of the user performing the task, the entire query, etc etc.. and I'd really not like to hand out the names of my databases, tables, columns and the logins online :)... I can tell you that the queries running at that time are normal, standard queries for our site that run all the time, nothing out of the norm.

--March 24th

It's been about two weeks since the last reboot. I made several changes: I found a few queries where we were making heavy use of temp tables that were totally unnecessary and had our developers change how they were doing it. I adjusted the size of some of the constantly (slowly but surely) growing databases to an intelligent size for their growth. I adjusted the autogrowth settings for everything as well to be more intelligent (they were ALL set to 1MB growth). Lastly I cleaned up MSDB a bit. We do log shipping and really didn't need to keep years and years worth of backup points, I've written some scripts that keep this to only a few months. I'll keep updating this thread, as it's too early to tell if the problem is solved yet.

Ben Pilbrow
  • 12,041
  • 5
  • 36
  • 57
Dave Holland
  • 1,898
  • 1
  • 13
  • 18
  • If you run the same queries through Management Studio, do you see the same performance problems as if they are ran through the application? What makes the performance degradation stop or go away? Do you reboot the server? Is this a physical server or a VM? Does it have its own storage or is it part of a SAN? – DCNYAM Mar 09 '10 at 15:06
  • Network Attached Storage, an MD 3000 to be exact. Restarting the SQL service makes it go away. Yes you see the same slower response times from studio during that time. – Dave Holland Mar 18 '10 at 14:34

9 Answers9

3

We found it. It turned out that it was actually a web server that had a problem with one of it's app pools. It would get stuck running the same set of queries over and over (which happened to deal in temp tables). It would just loop and loop and eventually cause the SQL server to be sad. Once this offending machine / app pool was found and 'put down' everything was resolved.

Dave Holland
  • 1,898
  • 1
  • 13
  • 18
2

You have to ask yourself, what happens at a SQL service restart? Lots of stuff, but two relevant points come to mind:

1) SQL memory is freed.

Its possible (not sure how likely), that if your MaxMemory setting is set too high, that the SQL service grows to use all available memory, and Windows starts to swap important stuff out to the swap file. Check to make sure that MaxMemory is set to a reasonable value, leaving enough additional memory for whatever else needs to run on that box (is it a dedicated SQL server? Or is it also the app server?)

2) TempDB is rebuilt from the default sizes.

Check your default tempdb file sizes, especially the default size and growth interval of the TempDB Log file. If the growth interval is set too LOW, then the log can build up some incredible internal fragmentation, which can dramatically slow down normal usage. See these two excellent blog articles by Kimberly Tripp.

BradC
  • 2,220
  • 4
  • 27
  • 36
  • 1) The machine is a dedicated SQL server with 16GB of memory, with 14GB allotted to SQL. 2) I haven't had to reboot since I made some adjustments to DB size and growth. The temp table was included in the adjustments I made so it's possible it had some impact. Its only been a few weeks so I am waiting to see if the situation happens again. – Dave Holland Mar 24 '10 at 14:32
1

Do you make heavy use of temporary tables or cursors? Check any cursors are being closed and deallocated correctly. Also watch out for linked servers - we've got to use a buggy driver for an old linked Informix server and it periodically means we have to reboot the server.

MartW
  • 1,305
  • 10
  • 15
  • We do use quite a few temp table calls, cursors I hope we don't use too often but I suppose it *IS* possible knowing some of our older coding "standards" so I shall look into that. We are using linked servers however only one, and its to another 2005 sql DB. – Dave Holland Mar 21 '10 at 04:03
0

If it looks weird then look for the weird.

If tweaking sql server settings doesn't help try the windows task manager: go to processes tab, then options > columns > add cpu time, handles, read, write, other and the memory options.

Go back to the process list. For each column sort by highest to lowest and look at the top 5 processes. Anything out of the ordinary? e.g. A memory leak on a process will have a bizarre number of handles. We have some *ki printers which add a handle to the DCSLoader process every 2 seconds. After a few weeks a machine lists lots of free memory and cpu but a process with 100,000 handles and will barely move the mouse pointer.

Check your scheduled tasks list too. Tell your AV not to scan .mdf files.

jqa
  • 451
  • 2
  • 7
  • Yeah I've done all that, nothing in the process lists are out of the ordinary, and as I've stated I do not reboot the machine.. only restart the SQL service and the problem is solved so it's unlikely I'm going to find the issue outside of SQL Server processes. Looking at the handles is a good idea though, I'll check that next time. – Dave Holland Mar 18 '10 at 14:36
0

Dave,

Have you checked the wait stats? the query you gave above lists the 'last_wait_type' column. that column may have some details regarding what the queries are waiting for (network, cpu, etc.)

SQLRockstar
  • 713
  • 6
  • 10
0

If your backup "Recovery model" is FULL, then does taking a backup of the DB and then a backup of the transaction logs improve things at all? On a system that is running out of disk space, this kind of thing might explain the problem.

djangofan
  • 4,182
  • 10
  • 46
  • 59
  • All the DB's are logged shipped every 15 minutes - which means the db's and trans logs are backed up constantly, so it's not the issue.... they are also all running on an md3K with about a terabyte of free space. – Dave Holland Mar 21 '10 at 04:02
  • good to know. using what method do your SQL clients connect to the SQL server? still, lots of questions. Is the server 64-bit? – djangofan Mar 22 '10 at 15:44
  • The clients are .net websites (toolbox.com) and yes 64 bit. – Dave Holland Mar 24 '10 at 14:39
  • so, are your .net clients using the jdbc2.x driver and are they using integrated auth or not? – djangofan Mar 24 '10 at 16:06
0

I seem to have a configuration very similar to yours (16Gb, upgraded to 32Gb, and MD1000 with a terabyte of disks, dual quadcore xeon).

The only thing that has helped me diagnose bizarre problems like that in the past is beta_lockinfo by Erland Sommarskog. Run it when it's slow time and compare.

Also I've had an insane amount of problems with SQL 2005 before SP2, but SP3 is really stable.

Ricardo Pardini
  • 776
  • 7
  • 9
  • Actually, I just remembered. Try using "Lock pages in memory". With CU4 for SP3, even SQL 2005 Standard can use it. See http://blogs.msdn.com/suhde/archive/2009/05/20/lock-pages-in-memory-now-available-for-standard-edition-of-sql-server.aspx – Ricardo Pardini Mar 27 '10 at 19:23
0

Hope this gives more useful info:

SELECT  D.text SQLStatement,
        A.Session_ID SPID,
        C.BlkBy,
        ISNULL(B.status, A.status) Status,
        A.login_name Login,
        A.host_name HostName,
        DB_NAME(B.Database_ID) DBName,
        B.command,
        ISNULL(B.cpu_time, A.cpu_time) CPUTime,
        ISNULL((B.reads + B.writes), (A.reads + A.writes)) DiskIO,
        A.last_request_start_time LastBatch,
        A.program_name
FROM    sys.dm_exec_sessions A
        LEFT JOIN sys.dm_exec_requests B
        ON A.session_id = B.session_id
        LEFT JOIN (
                   SELECT   A.request_session_id SPID,
                            B.blocking_session_id BlkBy
                   FROM     sys.dm_tran_locks AS A
                            INNER JOIN sys.dm_os_waiting_tasks AS B
                            ON A.lock_owner_address = B.resource_address
                  ) C
        ON A.Session_ID = C.SPID
        OUTER APPLY sys.dm_exec_sql_text(sql_handle) D
WHERE   DB_NAME(B.Database_ID) = 'YourDBName' -- Comment out line for all db's
ORDER BY ISNULL(B.cpu_time, A.cpu_time) + ISNULL((B.reads + B.writes), (A.reads + A.writes)) DESC

Make sure db is ok with:

DBCC CHECKDB -- Checks the allocation and structural integrity of all the objects in the specified database.
DBCC UPDATEUSAGE (bybox) -- Reports and corrects pages and row count inaccuracies in the catalog views

Keep an eye on logspace with:

DBCC SQLPERF(LOGSPACE)

If you see expansion going on, that will definately slow things down. If you run this you will see your logspace get nearer and nearer to 100%, then the log will expand and the percentage will shrink as its got some space. Hopefully you'll never get to see it expand before your backup kicks in and clears down the log.

Simon Hughes
  • 101
  • 2
  • When I run the first query I don't get any results - mostly because there really aren't blocking sessions that happen during these slow times... it's just that the queries all run slower in general. I ran through all the DBCC checks and updateusages and they looked good. As far as DBCC SQLPERF(LOGSPACE) the only DB that is ever even close to 100% (at 75%) is model and it never changes significantly, the log ship backups are taking care of the log size. – Dave Holland Apr 08 '10 at 17:50
-1

Mostly idiot configuration. Happens.

  • First, you should actually regularly run index defrag in a maintenance run. Schedule it as activity, just before or after you make backups.

  • Second, do not autogrow your database and especially do not autoshrink it. Depending on load autogrow / autoshrink are basically suicide settings.

Not seen a slowing down SQL Server like that pretty much ever. Can you post the results of that query under times of hugh stress? Sure nothing on your end overloads SQL Server at that time?

TomTom
  • 51,649
  • 7
  • 54
  • 136
  • To your first point: We have weekly (and some daily depending on the tables) maintenance jobs that index defrag and update statistics. If you pull back information in the indexes, even when it's slow they are less than 2-3% fragmented. To your second point: We do not autoshrink - for sure. These databases hold user info / site content, etc that is constantly increasing (not by a ton... these aren't huge databases) but if I don't let them autogrow how are they supposed to be large enough? I'm going to add some details to the end of my post to address the last of what you said. – Dave Holland Mar 09 '10 at 14:55
  • 3
    Autogrow is not really a bad thing. Relying on it is, but having it enabled is a lot better than all changes to your database being stopped because it's at max size. – Shane Mar 09 '10 at 15:00
  • Right, which is what I figured. HOWEVER. I just looked at one of my databases - autogrow was on but set to 1MB growth *sigh* I'm guessing it was having to autogrow way too often. I set it to 5% - which is probably what it grows over a period of 2-3 months. I am hoping this may help. – Dave Holland Mar 09 '10 at 15:14
  • 2
    Growth by percentage usually isn't a good thing either. When you database becomes large, a 5% growth will be much larger than when the database first started off. 1MB is too small, but you should decide on a fixed MB growth rate based on the size and usage of your database. – DCNYAM Mar 09 '10 at 15:18
  • 1
    Autogrow is bad because it clusters the file with log of small increments. Has a lot of negative implications. http://support.microsoft.com/kb/315512 Rather: set the files to a proper size, then run regular checks with a fill report. Make sure they dont overgrow. 1mb could be the possible culprit, btw... if it has to stop / grow / stop / grow while doing maintenance you dont want to know the performance. – TomTom Mar 09 '10 at 15:21
  • Are you storing the absolute minimum amount of data in your database tables to support your app? In a past job life when I supported the database for a major car rental company, this was essential to keeping query times down. – jl. Mar 09 '10 at 15:57
  • Autogrow has bad points but what's worse; a performance hit because it has to grow itself or complete performance shutdown because it's got nowhere else to go? It's a last defense that should not be relied on but shouldn't get you into too much trouble if you're properly monitoring your databases. – Shane Mar 09 '10 at 16:00
  • 1
    Autogrow is harmless provided it rarely happens. When it gets bad is when it's used as a substitute for proper sizing, which I suspect is what **TomTom** *really* means. Otherwise by all means use it. – Maximus Minimus Mar 09 '10 at 16:16
  • To add a little more - the largest DB is only 22GB followed by 13GB, 11GB, 9GB and then the rest are all under 5GB. They are in full recovery mode doing log shipping to another server as well, btw. – Dave Holland Mar 09 '10 at 16:53
  • Jl - No, we don't have any additional cruft in the database. It's all important data... the majority of it is user data and posts. The site is www.toolbox.com if you want to get an idea of what we do. – Dave Holland Mar 09 '10 at 16:56
  • Not sure where to go then. As in: you are better off with a MS support call and them logging into the database or getting performance logs when this happens next. I would really like to get some performance stats, memory stats (sql server internal) and locking lists. The server is fully patched up? I mean, not service pack wise - but with the latest cumulative update? – TomTom Mar 09 '10 at 18:45