0

We have a mature Wince 6.0 R2 custom device that is capable of downloading files via WiFi and storing them in a NAND flash FAT file system partition. This has been running on over 15000 devices around the world for over a year now, but recently on some test systems for new software and OS versions, we have been seeing some file system corruptions where a particular directory seems to have a recursive link back to the top level \Flash contents. In particular, we have a \Flash\Manifest directory that includes a subdirectory called GCMaps. Normally this contains a number of map images, but when the corruption occurs, it also includes all of the high level \Flash files and subdirectories in an apparent recursive loop, e.g. \Flash\Manifest\GCMaps\program.exe and \Flash\Manifest\GCMaps\Manifest\GCMaps\Manifest...

It is always the same directory that has the problem, and it is happening on multiple devices on our test rack, although many of our test devices are completely unaffected. I am able to temporarily fix the affected devices by either reformatting the file system partition or by erasing the entire flash device, repartitioning, reflashing the OS, and recreating the file system. But the affected devices continue to develop the corruption within a couple of days.

Recent testing has shown that the file system remains intact after changing the Manifest files multiple times, but then we have an automated reboot at midnight, and upon bootup, some of the affected devices exhibit the problem.

What is strange is that we have not recently changed any of the manifest download or integration logic, nor anything that has anything to do with GCMaps at all. One major change I have made recently was to remove the Windows Shell and run our devices in "Kiosk Mode" with our applications being the only UI.

Has anyone encountered this kind of recursive directory corruption on CE before, and if so, did you find a solution? Is there any reason that removing the shell could have caused this? Any suggestions or information would be appreciated!

Thanks, Rich Jones

rjones54
  • 21
  • 4
  • I'm skeptical that this is flushing issue since I am not missing files and files are not getting corrupted themselves. I am instead getting a circular reference of the directory structure that always occurs under one particular subdirectory. Is it possible for FAT to get damaged during a read??? That is all we are doing with the files in that particular directory when the corruption occurs. – rjones54 Sep 28 '11 at 12:27
  • 1
    if you want to add something to your question, edit it instead of posting a comment. – Eugene Mayevski 'Callback Sep 28 '11 at 13:07

1 Answers1

0

FAT is prone to corruption if you don't flush filesystem buffers before rebooting (or if you forcefully turn off the device). This applies to both PCs and other devices that use FAT. So it's probably rebooting that causes the problem. Removal of shell can be related or unrelated - it's possible that shell performs some flushes periodically and this saved you from the issue before.

Eugene Mayevski 'Callback
  • 45,135
  • 8
  • 71
  • 121
  • Thanks for the input! I neglected to mention that the time between the last writes and the reboots is many hours, so I wouldn't think that the filesystem buffers were still unflushed, unless the shell was responsible for that previously as you pointed out. I will certainly add in a forced flush, although I had also seen a suggestion to set "EnableCache" to "0" in HKEY_LOCAL_MACHINE\System\StorageManager\FATFS to suppress caching altogether. Does anyone know if this is a good idea? – rjones54 Sep 28 '11 at 12:24
  • @rjones54 Regarding disabling cache - you need to measure performance cause the answer depends on your usage scenarios. – Eugene Mayevski 'Callback Sep 28 '11 at 13:06