4

Earlier this morning, store.exe fuzzled up in one way or another, which necessitated a restart of our Exchange server. It came back online with no errors or problems, all the transaction logs replayed successfully, and all the stores mounted as normal. To me, it was just one of those random crashes; however, our consultant suspects it was caused by corruption in one of the stores. Perhaps he's correct, since he has far more experience than me, but that's not the point.

To fix the suspected errors, he's planinng to run an ESEUTIL defrag (via PerfectDisk) to fix them, which he claims will also fix any errors present.

From what I understand, defrag, verify, and repair are 3 separate actions, and a defrag does not imply any kind of integrity check. Is this correct? Are there any dangers of running a straight-up defrag on a database that might be corrupt?

Edit:

Here's the first error in the event log, which indicated the start of the problems we were having. Anyone know what it might indicate?

Event Type: Error
Event Source:   Microsoft Exchange Server
Event Category: None
Event ID:   1000
Date:       11/23/2011
Time:       8:15:47 AM
User:       N/A
Computer:   SERVER
Description:
Faulting application exsp.dll, version 6.5.7638.1, stamp 430e735b, faulting module kernel32.dll, version 5.2.3790.4480, stamp 49c51f0a, debug? 0, fault address 0x0000bef7.

For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp.
Data:
0000: 41 00 70 00 70 00 6c 00   A.p.p.l.
0008: 69 00 63 00 61 00 74 00   i.c.a.t.
0010: 69 00 6f 00 6e 00 20 00   i.o.n. .
0018: 46 00 61 00 69 00 6c 00   F.a.i.l.
0020: 75 00 72 00 65 00 20 00   u.r.e. .
0028: 20 00 65 00 78 00 73 00    .e.x.s.
0030: 70 00 2e 00 64 00 6c 00   p...d.l.
0038: 6c 00 20 00 36 00 2e 00   l. .6...
0040: 35 00 2e 00 37 00 36 00   5...7.6.
0048: 33 00 38 00 2e 00 31 00   3.8...1.
0050: 20 00 34 00 33 00 30 00    .4.3.0.
0058: 65 00 37 00 33 00 35 00   e.7.3.5.
0060: 62 00 20 00 69 00 6e 00   b. .i.n.
0068: 20 00 6b 00 65 00 72 00    .k.e.r.
0070: 6e 00 65 00 6c 00 33 00   n.e.l.3.
0078: 32 00 2e 00 64 00 6c 00   2...d.l.
0080: 6c 00 20 00 35 00 2e 00   l. .5...
0088: 32 00 2e 00 33 00 37 00   2...3.7.
0090: 39 00 30 00 2e 00 34 00   9.0...4.
0098: 34 00 38 00 30 00 20 00   4.8.0. .
00a0: 34 00 39 00 63 00 35 00   4.9.c.5.
00a8: 31 00 66 00 30 00 61 00   1.f.0.a.
00b0: 20 00 66 00 44 00 65 00    .f.D.e.
00b8: 62 00 75 00 67 00 20 00   b.u.g. .
00c0: 30 00 20 00 61 00 74 00   0. .a.t.
00c8: 20 00 6f 00 66 00 66 00    .o.f.f.
00d0: 73 00 65 00 74 00 20 00   s.e.t. .
00d8: 30 00 30 00 30 00 30 00   0.0.0.0.
00e0: 62 00 65 00 66 00 37 00   b.e.f.7.
00e8: 0d 00 0a 00               ....    
Bigbio2002
  • 2,823
  • 12
  • 35
  • 54

2 Answers2

6

An offline defragmentation using eseutil will fail if it encounters page-level corruption in the database because. You'd have to use the /p option (rePair) to discard corrupt pages.

Corruption of data at a logical level (think damage to the "data" in the database, not the "structure" of the database) cannot be repaired by eseutil. The isinteg tool can find logical inconsistencies in the database in versions of Exchange up to 2007. In Exchange 2010 isinteg was replaced with the new-MailboxRepairRequest cmdlet (more details are available on the Exchange Team blog).

Having said all that, I'm concerned about your consultant's advice. Are you seeing events in the Application Event log from ESE or Exchange-related services that indicate any database corruption? In general, Exchange doesn't "randomly crash" and problem with a hardware driver or the hardware itself seems to be a more likely cause than a problem with Exchange. Further, since the logs replayed without issue I find it a bit unlikely that you're taking page-level corruption.

Finally, if you are taking page-level corruption just cleaning that corruption up isn't a solution. You need to find the root cause (faulty hardware, etc) that's causing the corruption and eliminate it. Doing anything else is just exposing you to continued risk.

The offline defragmentation isn't, by itself, a major risk. You must immediately take a full backup after the completion of the offline defragmentation because all prior incremental and differential backups cannot be restored (because the new database is just that-- a brand new database). Obviously, your server will be inaccessible to users during the defragmentation period, too.

I'd be researching what happened this morning in detail and coming to a root cause conclusion (or at least a very likely hypothesis) before I started spending money "fixing" it.

I had a recent case where an Exchange Server 2003 machine wouldn't take VSS snapshots and reported various JET errors during attempted backups. I opted to spin up a new Exchange installation on another machine, move all the user mailboxes over to the new machine, then delete the problematic database on the original server and allow a new one to be created. After we did some stress testing and verified that the original server was functioning properly we moved all the mailboxes back. I'd prefer that strategy in the situation you're describing (if I had sufficient Event Log messages that indicated real "corruption" in the original Exchange Server computer's mailbox database).

Edit:

The entry you posted above is a fault in the Exchange provider for Microsoft Search (which makes full-text indexes of Exchange databases). I'd be interested to see more of what happened afterward, as well as any events immediately preceding this one from the System Event Log. Did you have a low disk space condition on any of the server computer's volumes?

Evan Anderson
  • 141,881
  • 20
  • 196
  • 331
  • It seems that his suspection of curruption is a "gut feeling" from his prior experience. While there are event log entries about the crash, it's just a couple of faults in store.exe and exsp.dll. There's nothing that indicated database corruption (from my POV). I can add the event log details to my OP if you'd like. – Bigbio2002 Nov 23 '11 at 23:36
  • @Bigbio2002: I wouldn't mind seeing them though they might not tell us anything concrete. I'd be much more apt to run `eseutil` in integrity mode before I'd launch into an offline defragmentation. – Evan Anderson Nov 23 '11 at 23:43
  • That's what I think too; however, he's taken ownership of this issue and insists on performing a defrag himself. He said this is to buffer me of liability in case anything goes wrong, but I'm the clingy posessve type and don't want anybody doing anything to "my" network without my go-ahead. – Bigbio2002 Nov 23 '11 at 23:49
  • 1
    @Bigbio2002: At the end of the day, you're the "owner" of the situation, not the consultant. Anyone whom I hire to do a job for me does so under my purview and with my permission and approval. If you're not comfortable with his intended course of action then put your foot down with him/her or with your boss/manager/owner if need be. Performing some action based on a "gut" feeling instead of based on evidence that supports his course of action is not and should never be acceptable to you or your boss/manager/owner, especially in regards to the Exchange mailbox database(s). – joeqwerty Nov 24 '11 at 00:23
  • He used to be the guy in my position, but he quit and became a consultant, so he's been training and looking over me. He's hired by the company, not by me, though I'm the only IT guy. I admit I've been a sysadmin for barely a year, but I'm highly intellgent and a quick learner. He has lots of experience, and I do respect him, but I feel that I've progressed to the point where I can handle these problems on my own. While I wouldn't have suspected store corruption on my own (that ability comes with experience, if in fact it is actually corrupted), I can handle applying the solution myself. – Bigbio2002 Nov 24 '11 at 00:39
  • Gotcha. I'm certainly not knocking him, and having a certain "gut" feeling about something can often turn out correct but in the end I'm going to look for evidence to support my "gut" feeling and in the absence of such evidence I'd have to say that running a "preemptive" cycle of eseutil seems off the mark. Exchange is pretty good about logging database problems to the event log so if you're not seeing anything in the event log then I'd be inclined to push back on this until there's evidence to support his theory. – joeqwerty Nov 24 '11 at 00:46
  • I've got the "book smarts", but he's got the "street smarts", at least as far as IT goes. I'm good with noticing aberrations myself, and nothing in the event log jumped out at me at all. Though there's almost certainly a root cause that's fixable, I see the whole thing as just a hiccup, not some systemic database corruption issue. Thanks for all your advice, I give you much respect too! – Bigbio2002 Nov 24 '11 at 00:54
0

ESEUTIL defragmentation is not dedicatedly for extensive Exchange database repair. Defragmentation function is to reclaim free space in database and optimize database performance by creating a new compacted database file.

While you are running the defragmentation, it may also perform certain repairs on the database when it found any inconsistencies or issues. This is the part of overall defragmentation process & can fix minor problems.

If your Exchange Server database is corrupt, it is recommended to first run the ESEUTIL /mh command to check the complete status of your database. If you have found, database is in dirty state. Later, you can use the ESEUTIL /P or ESEUTIL /R commands as per the database damage. Make sure, you took the backup of your database before attempting any repair operations.

I advised to consult from Microsoft support to ensure proper recovery steps.

You can refer these Microsoft articles:

https://techcommunity.microsoft.com/t5/exchange-team-blog/repairing-exchange-databases-with-eseutil-when-and-how/ba-p/610276

https://social.technet.microsoft.com/wiki/contents/articles/53450.how-to-check-exchange-database-health.aspx