0

Our customer support came up to us with a weird problem. Every day there are some devices that didn't start our application because an error occured during the app launch phase.

While investigating the issue further I found the following post. Unfortunately it doesn't apply to my problem.

Windows CE deletes .NET CF on reset

The GAC is still present after a warm boot so the problem doesn't come from that corner.

While digging deeper into it we found that the problem is caused by a corrupted log4net.dll. Right before the problem arises there is a software update running. What happens there basically is:

- Download the new version
- Reboot the device (warm)
- A vendor specific CAB installer installs the CAB
- The device gets rebooted again
- A sysbackup is running
- Another warm boot
- Our own application is started

At the last step the app crashes with an error that the log4net assembly couldn't be found or is invalid (something like that). After comparing a working version of the assembly to the one on the device we found out that somewhere along the way of the software update the log4net assembly gets corrupted. Oddly there is a part of different config XMLs at the start of the assembly.
To me it looks a lot like there is something off with a buffer or offset. Did anyone see such a behavior before? We only see it on one out of two devices at the moment. Unfortunately our second device struggles with some other problems so they're not as widely spread as the first one on which the error happens. Therefore we can't say if it's limited to only one device type.

Any suggestions on this one?

Community
  • 1
  • 1
Marcel Dutt
  • 180
  • 2
  • 10
  • I have seen such weired file corruption on devices using the sd card and do a 'hard' warmboot. The problem is that the filesystem cache is not yet written to the 'disk' when the 'hard' warmboot is exectuted. A 'hard' warm boot is one that calls kernelioctl directly. Mostly the device vendor SDK comes with another warmboot API that is more 'nice' and using that fixed the prob. But we have also seen devices where the vendor had to supply a 'better' filesys driver. – josef Dec 18 '14 at 04:18
  • Good approach. Since I was writing 90% of the HAL code I know that especially this device uses a warm boot executable of the vendor. The API version never really worked so we went down that road (even though I would prefer an API call...). But it's worth some further investigation. I could also imagine that the employee just presses the reset button on the back to speed up things a little (5 seconds timeout before reboot...). I guess if you press that reset button the described kernelioctl method is called (we usually only do that as a last resort fallback in our code). – Marcel Dutt Dec 18 '14 at 08:39
  • If you remove the battery or press the reset button (hard reset) before shutting down or in the mid of file operations then you may get corrupted files. Using the vendors 'reset' procedure ensures that the drivers have enough time to save there data and write back the cache. – josef Dec 18 '14 at 13:14

0 Answers0