High Physical Disk Queue on Exchange 2003 mail store

Question

I have a raid 10 array with 10 7200 SATA rpm disks. My disk queue length during business hours is averaging around 100. The following is true of the setup:

The array has one mail Store with 95 active mailboxes. (This is the only thing, no logs or system files)
Avg Mailbox size of ~ 400 MegaBytes
The array is one large 1.3 TB partition that was aligned to the raid stripe
The Mail store is ~ 48 GB ( for both etm and stm files )
The mail store was just defragged
The transaction logs are on another array that has a less than 1 avg disk queue

Does this seem high? If so, does anything seem wrong with this setup? Should I look at some other counters?

Updates after comments:

The array itself seems okay, just the other weeks it got good results from several days of jetstress testing
There is no pagefile on the array :-)
There is symantec AV software. The Highest two of IO Reads and IO Writes from looking at Task Manager are conduit.exe ( symantec anti spam / virus ) and store.exe. conduit is at 18 million reads and 25 million writes, store is 144 million reads and 9 million writes. As of now, since I have gateway servers, so I am looking into taking AV off of the backend server.

As a first step disable Symantec (a known cause of problems) and see if anything changes). — John Gardeniers, Jul 14 '09 at 20:12
Yup That's why my first guess was AV software- Symantec in particular is notorious for pummeling store servers into the ground. Do your antispam/AV at the front end. — Jim B, Jul 14 '09 at 20:32
I've had E2K3 boxes running SAVMSE (version 4 and 5) that worked fine (both running "Premium AntiSpam" and not). In particular, I know of a box w/ 225 mailboxes running today on a 4 disk RAID-5 w/ 7,200 RPM Ultra160 SCSI disks that's still going strong. I don't love Symantec's AV products (actually, I rather hate them), but I've not had the experience with SAVMSE causing bad store performance. — Evan Anderson, Jul 15 '09 at 00:45

Evan Anderson · Accepted Answer · 2009-07-14T19:49:43.207

That's beyond "rather high"-- that's exceedingly, mind-blowingly high. I have boxes with over double that number of mailboxes running on RAID-5 on old-as-dirt 7,200RPM Ultra160 SCSI drives with much lower disk queues.

Something else besides Exchange is thrashing your disks. I'd throw open Perfmon and graph the "IO Data Operations / sec" in the "Process" object for each individual process and see what process is causing so much IO.

Edit:

The article you linked in your comment to Jim B has some very good perfmon counters to have a look at. I'm wondering, too, if you've put a virtual memory pagefile onto those disks and are seeing excessive paging.

I do have some suspicions, after reading the article and linked articles re: Entourage that you might be having some issues associted with those clients. Outlook Anywhere (aka RPC over HTTP) isn't going to cause the same problems that Entourage will, though-- that's a different thing entirely (MAPI over HTTP, versus WebDAV which the Entourage clients use).

It goes w/o asking, but are you seeing anything odd in the event logs?

Edit after your update:

The total number of reads / writes isn't really what you're looking for. You're really looking for the delta in reads / writes on a per-interval basis. Throw open Perfmon, clear the default counters, and add some counters for:

Object: Process - Counter: Data Operations/sec - Instance: conduit.exe
Object: Process - Counter: Data Operations/sec - Instance: store.exe

You might also have a look at the Microsoft Exchange User Monitor (nice little article about its usage available at http://www.msexchange.org/tutorials/Microsoft-Exchange-Server-User-Monitor.html). This won't show WebDAV sessions, but it might give you some insight into what your traditional MAPI-based users are doing.

I added Data Operations / sec counters counters for every single process. I can't see any direct correlation between them and the disk queue for this particular array... sigh... — Kyle Brandt, Jul 15 '09 at 13:58
By Using process explorer and filtering based on file operations and H:\ I see the only process really accessing the disk is store.exe . And I probably should have added before that I am running BES. — Kyle Brandt, Jul 15 '09 at 14:35
I'd be curious to hear what the Exchange Troubleshooting Assistant has to say about the server: http://www.microsoft.com/downloads/details.aspx?familyid=4bdc1d6b-de34-4f1c-aeba-fed1256caf9a&displaylang=en — Evan Anderson, Jul 15 '09 at 15:11
"Moderately High RPC Usage. RPC Usage is distributed Across Multiple Users. Low space on Drive C:\ ." Watching the data today, I think I might have put this too disk queue to high. I was using nagios with rrd to log the data and was taking the MAX of two snapshots of the disk queue taken every five minutes. It is generally around 0-20. But it will spike to about an average of 100 for about 10 minutes with a max 250. But when I looked at Process Monitor during this time it was still only store.exe . But maybe symantec ties into store.exe somehow? — Kyle Brandt, Jul 15 '09 at 16:11

score 2 · Answer 2 · answered Jul 14 '09 at 18:28

Whoa! that's very high. The average queue length should be equal to, or less than, the number of physical disk spindles, so your machine is thrashing about an order of magnitude above where it should be. This link has the list of all the Exchange operations that cause disk I/O, so along with Sam & Evan's suggestions you should verify that you don't have unusual amounts of any of those activities (like a mail loop).

score 1 · Answer 3 · answered Jul 14 '09 at 18:15

1

That is rather high, do you have any kind of AV software? Also look at \process*\io data operations/sec. That should tell you if it's store.exe or somehting else that causing the IO. If it's store.exe I'd guess that something is scanning thru your mailboxes.

answered Jul 14 '09 at 18:15

Jim B

24,081
4
36
60

1

Yes, I do have AV ( symantec mail security ) running on the server. Also I have a lot of entourage and laptops configured to use outlook over http. Is that webdav? Maybe this link has something to do with it ? http://www.macwindows.com/041107d.html – Kyle Brandt Jul 14 '09 at 18:22
yes both can cauase high disk usage – Jim B Jul 14 '09 at 19:33

score 1 · Answer 4 · answered Jul 14 '09 at 18:26

I would first download process explorer and use it to see what process is actually causing the high disk IO, is it the information store, or something else. Then you can go on from there.

It would be useful to know how much mail you are are actually processing, as 95 mail boxes is not a huge amount, but if there all in constant user then it could be an issue.

Also, you have probably already checked this, but is your RIAD array ok, if its rebuilding from a failed disk that can cause significant disk IO.

score 1 · Answer 5 · answered Jul 14 '09 at 19:14

You might also look at Process Monitor, which logs each access on the disk individually. If something other than Exchange is using your disks, you'll see it fill the list of disk accesses very quickly in ProcMon.

You don't mention how much RAM you have, although the specs you did mention indicate you probably have a reasonable amount. If you're running less than 2 GB, you might see this kind of behavior from the machine hitting the page file. Make sure the amount of RAM used in Task Manager is less than the amount of physical memory installed in the server.

High Physical Disk Queue on Exchange 2003 mail store

5 Answers5