I recently added a 7th 2TB drive to a linux md software RAID 6 setup. After md finished reshaping the array from 6 to 7 drives (from 8 to 10TB), I was still able to mount the file system without problems. In preparation for resize2fs, I then unmounted the partition and ran fsck -Cfyv
and was greeted with an endless stream of millions of random errors. Here is a short excerpt:
Pass 1: Checking inodes, blocks, and sizes
Inode 4193823 is too big. Truncate? yes
Block #1 (748971705) causes symlink to be too big. CLEARED.
Block #2 (1076864997) causes symlink to be too big. CLEARED.
Block #3 (172764063) causes symlink to be too big. CLEARED.
...
Inode 4271831 has a extra size (39949) which is invalid Fix? yes
Inode 4271831 is in use, but has dtime set. Fix? yes
Inode 4271831 has imagic flag set. Clear? yes
Inode 4271831 has a extra size (8723) which is invalid Fix? yes
Inode 4271831 has EXTENTS_FL flag set on filesystem without extents support. Clear? yes
...
Inode 4427371 has compression flag set on filesystem without compression support. Clear? yes
Inode 4427371 has a bad extended attribute block 1242363527. Clear? yes
Inode 4427371 has INDEX_FL flag set but is not a directory. Clear HTree index? yes
Inode 4427371, i_size is 7582975773853056983, should be 0. Fix? yes
...
Inode 4556567, i_blocks is 5120, should be 5184. Fix? yes
Inode 4566900, i_blocks is 5160, should be 5200. Fix? yes
...
Inode 5628285 has illegal block(s). Clear? yes
Illegal block #0 (4216391480) in inode 5628285. CLEARED.
Illegal block #1 (2738385218) in inode 5628285. CLEARED.
Illegal block #2 (2576491528) in inode 5628285. CLEARED.
...
Illegal indirect block (2281966716) in inode 5628285. CLEARED.
Illegal double indirect block (2578476333) in inode 5628285. CLEARED.
Illegal block #477119515 (3531691799) in inode 5628285. CLEARED.
Compression? Extents? I've never had ext4 anywhere near this machine!
Now, the problem is that fsck keeps dying with the following error message:
Error storing directory block information (inode=5628285, block=0, num=316775570): Memory allocation failed
At first I was able to simply re-run fsck and it would die at a different inode, but now it's settled on 5628285 and I can't get it to go beyond that.
I've spent the last days trying to search for fixes to this and found the following 3 "solutions":
- Use 64-bit linux.
/proc/cpuinfo
containslm
as one of the processorflags
,getconf LONG_BIT
returns64
anduname -a
has this to say:Linux <servername> 3.2.0-4-amd64 #1 SMP Debian 3.2.46-1 x86_64 GNU/Linux
. Should be all good, no? - Add
[scratch_files]
/directory = /var/cache/e2fsck
to/etc/e2fsck.conf
. Did that and every time I re-run fsck, it adds another 500K*-dirinfo-*
and an 8M*-icount-*
file to the/var/cache/e2fsck
directory. So that seems to have its desired effect as well. - Add more memory or swap space to the machine. 12GB of RAM and a 32GB swap partition should be sufficient, no?
Needless to say: Nothing helped, otherwise I wouldn't be writing here.
Naturally, now the drive is marked bad and I can't mount it any more. So, as of right now, I lost 8TB of data due to a disk-check?!?!?
This leaves me with 3 questions:
- Is there anything I can do to fix this drive (remember, everything was fine before I ran fsck!) other than spending a month to learn the ext3 disk format and then trying to fix it manually with a hex editor???
- How is it possible, that something as mission-critical as fsck for a file-system as popular as ext3 still has issues like this??? Especially since ext3 is over a decade old.
- Is there an alternative to ext3 that doesn't have these sorts of fundamental reliability issues? Maybe jfs?
(I'm using e2fsck 1.42.5 on 64-bit Debian Wheezy 7.1 now, but had the same issues with an earlier version on 32-bit Debian Squeeze)