0

JUST THIS MONTH, we have started getting reports from a number of very stable clients that MrxSmb event id 50 errors keep appearing in their system event logs. Otherwise, they do not appear to have any networking problems except that there is a critical legacy application which seems to either be generating the MrxSmb errors or having errors occur because of them. The legacy application is comprised of 16 bit and 32 bit code and has not been changed or recompiled in many years. It has always been stable on Windows XP systems. The customers that have the problem usually have a small (5 clients or less) peer to peer network with all Windows XP systems. All service packs are loaded on the XP machines.

Note: The only thing that seems to correct the problem is disabling opportunistic locking. I don't like this solution because it seems to slow down the network and sometimes causes record locking issues between users (on some networks). Also, this seems to have just started happening - as if a Windows update for XP has caused it? However, I have removed recent updates and it did not correct the issue.

Thanks in advance for any help you can provide.

Ben Pilbrow
  • 12,041
  • 5
  • 36
  • 57
Johnny Musso
  • 1
  • 1
  • 1
  • are you accessing files from a server which is remote to the computers, or is this application installed locally on all machines? – Ben Pilbrow Nov 03 '10 at 21:36
  • We are accessing the application across a local network via a shared folder on a Windows XP Pro machine. The clients are also Windows XP Pro. The error only occurs at clients. At the console of the hosting WinXP box, the errors do not occur. – Johnny Musso Nov 03 '10 at 21:56
  • can you please also clarify if you mean totally separate and isolated customers with no infrastructure or servers shared between them? – Ben Pilbrow Nov 03 '10 at 21:57
  • Sure Ben, it's usually a small office with 5 or less workstations, all with Windows XP Pro, all connected to a single switch (10/100), and all accessing the application from a shared folder on one of the Windows XP Pro machines. They usually also have a router with NAT enabled provided by their internet provider. It really couldn't be a simpler configuration. – Johnny Musso Nov 03 '10 at 22:04
  • OK, I have posted an answer for you to consider. If it's only a couple of customers, it could just be nasty coincidence that a couple of hard disks are on the blink. If it's more than say 5 customers all having the problem (and has only just started), then the chances of that many hard disks going flakey is questionable. In any case, it might give you something to think about. – Ben Pilbrow Nov 03 '10 at 22:09

2 Answers2

1

In my experience, those errors occurr due to a network problem of some sort, as the write is occurring to a network share. My suggestion would be to see if there's some commonality in the network components of the affected clients:

NIC and driver recently updated?

OS Service Pack recently installed?

Windows update recently applied?

Network congestion - ARP flooding or network broadcast problem?

Misconfigured switch port - Are the switch ports and NICs hard coded for speed and duplex settings or are they set to Auto?

Malware infection?

joeqwerty
  • 109,901
  • 6
  • 81
  • 172
  • Driver updates do get deployed via Microsoft Update, so that could be a culprit for multiple machines doing this at the same time I guess. – Ben Pilbrow Nov 03 '10 at 22:40
  • I suppose it's possible that they all have the same type of nics but I seriously doubt it. I'll research that a bit though. I rolled back windows updates that had occurred within the month or so before this started and it didn't seem to fix the issue. I could probably roll back more. Malware is doubtful because they all would have had to have picked up the same malware at about the same time. Switches etc. been in place long time and had no issues. – Johnny Musso Nov 04 '10 at 13:59
0

I have experienced this error before, and the cause was a failed hard disk combined with a dodgy RAID controller.

We had a file server which was set up with a hardware RAID5 and one of the disks in the array failed. It failed in the middle of the day, and almost instantly we started getting loads of "Delayed write failed" errors on all our client machines trying to load/save files on that server. They still loaded files from the server and still appeared to save files fine (like you would expect with RAID5), but with these warning messages. A few hours later it totally gave out and we had to do a full restore from a backup, so you might want to check your backups are OK and be prepared to restore from them.

The fact this has just started happening might indicate a hardware failure on the server (specifically the disk subsystem) has either happened or is imminent. If you can do a S.M.A.R.T test on the hard disk(s) in the computer, it might be a wise idea to run that.

Ben Pilbrow
  • 12,041
  • 5
  • 36
  • 57
  • Thanks, but there is no raid configuration and it's happening at multiple client locations with totally separate systems. So far, we've had 10 clients or so in different parts of the country experience the problem. – Johnny Musso Nov 03 '10 at 22:08
  • Hmm it does seem *incredibly* unlikely that 10 different customers hard disks are on the blink all at once. – Ben Pilbrow Nov 03 '10 at 22:10