Based on the symptoms described so far, the most likely explanation is that your harddisk has some bad sectors.
You can try using ddrescue to copy all the good sectors to a new disk. That should recover all the files, which can be recovered, and the rest will give incorrect data upon reading.
The reason you see high load average is that once you try to read a bad sector, the harddisk will try very hard to read that sector. Meanwhile everything else trying to access the disk will have to wait. The load average counts everything in the queue.
As soon as you get the EIO error, the load will quickly go down. But since the load average is an exponentially fading average over a period, the number will stay high for a while after the load is gone.
The reason fsck
doesn't report any problems is that it checks for logical integrity of the meta data. In order to do so it doesn't need to read the actual contents of any files. Reading all the contents of files would be way too slow for the normal usage of fsck
.
Once you have tried to read a file and gotten an error, you should be able to verify what happened by looking in the kernel log (either by running dmesg
or looking in logfiles).
Trying to read every file on the disk is one way to find all affected files, but it isn't the fastest. Careful interpretation of the output from ddrescue is probably the fastest way to identify which files are affected.