Unable to read some files on hard disk

Question

Our server has two independent hard disks without RAID.

One hard drive just for storing junk/temporary data where we generally delete temporary data using LRU algorithm.

We have a 10k file in that hard disk, some files are not readable (we don't know why).

We have already done fsck on this disk, on the first run it says it fixed some errors but the issue persists.

Whenever we try to read this unreadable file our load average goes high:

cp: overwrite `/tmp/t.mp4'? y
cp: reading `mq/full/68156.3gp': Input/output error

Is it possible to find a list of unreadable files?
What causes this issue?
How to solve this issue?

score 1 · Accepted Answer · edited Apr 13 '17 at 12:22

Based on the symptoms described so far, the most likely explanation is that your harddisk has some bad sectors.

You can try using ddrescue to copy all the good sectors to a new disk. That should recover all the files, which can be recovered, and the rest will give incorrect data upon reading.

The reason you see high load average is that once you try to read a bad sector, the harddisk will try very hard to read that sector. Meanwhile everything else trying to access the disk will have to wait. The load average counts everything in the queue.

As soon as you get the EIO error, the load will quickly go down. But since the load average is an exponentially fading average over a period, the number will stay high for a while after the load is gone.

The reason fsck doesn't report any problems is that it checks for logical integrity of the meta data. In order to do so it doesn't need to read the actual contents of any files. Reading all the contents of files would be way too slow for the normal usage of fsck.

Once you have tried to read a file and gotten an error, you should be able to verify what happened by looking in the kernel log (either by running dmesg or looking in logfiles).

Trying to read every file on the disk is one way to find all affected files, but it isn't the fastest. Careful interpretation of the output from ddrescue is probably the fastest way to identify which files are affected.

Unable to read some files on hard disk

1 Answers1