0

I want to make sure that nobody changed a file. In order to accomplish that, I want not only to check MD5 sum of the file, but also check its size, since as far as I understand this additional simple check can sophisticate falsification by several digits.

May I trust the size that stat returns? I don't mean if changes were made to stat itself. I don't go that deep. But, for instance, may one compromise the file size that stat returns by hacking the directory file? Or by similar means, that do not require superuser privileges?

It's Linux.

codeholic
  • 134
  • 5

3 Answers3

5

Here's a demo of sparse files which is one way size can be misleading:

$ dd if=/dev/zero of=sparse.out bs=512 seek=100000 count=0
0+0 records in
0+0 records out
0 bytes (0 B) copied, 7.5053e-05 s, 0.0 kB/s
$ echo hi>>sparse.out
$ ls -l sparse.out
-rw-r--r-- 1 user group 51200003 2010-04-13 02:09 sparse.out
$ stat sparse.out
  File: `sparse.out'
  Size: 51200003        Blocks: 24         IO Block: 4096   regular file
Device: 802h/2050d      Inode: 1111111     Links: 1
Access: (0644/-rw-r--r--)  Uid: ( 1111/  user)    Gid: ( 1111/  group)
Access: 2010-04-13 02:09:11.000000000 -0500
Modify: 2010-04-13 02:09:09.000000000 -0500
Change: 2010-04-13 02:09:09.000000000 -0500
$ hexdump -C sparse.out
00000000  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
030d4000  68 69 0a                                          |hi.|
030d4003
$ du sparse.out
12       sparse.out

As you can see, the byte count in ls and stat show the allocated space, but only the block count of stat and the output of du are even close to the actual contents of the file.

Dennis Williamson
  • 62,149
  • 16
  • 116
  • 151
  • +1 for an interesting fact, but it's not exactly what I was looking for. The file appears to be 51200003 bytes long, whether you read it or check its size with `stat`, so it doesn't matter how it is physically stored in the filesystem. So as far as I can see, it doesn't compromise the file size in any way. – codeholic Apr 13 '10 at 08:32
  • Well, 'ls -l' does a lstat() syscall, as does 'du' and the 'stat' command-line tool. The difference is just which field(s) they read from the stat struct the syscall returns. – janneb Apr 13 '10 at 12:51
1

You ask if someone may compromise the size of the file returned by stat by hacking the directory file. No, that's not possible. The directory is simply is a list of file names and inode numbers. All of the other file information (owner, group, mode, size, etc.) is contained in the inode (at least in POSIX compliant file systems) and that is from where stat collects this information.

TCampbell
  • 2,024
  • 14
  • 14
0

Why do you care about the size of the file? Comparing MD5 sums will tell you with absolute certainty if the file has changed or not. Flipping bits within the file will retain the file size, but could be a completely different file.

Brian Tillman
  • 693
  • 3
  • 5
  • 1
    Comparing MD5 sums will tell you with absolute certainty if the file has changed or not. - No. See http://en.wikipedia.org/wiki/Pigeonhole_principle – codeholic Apr 13 '10 at 07:30
  • Yes, MD5 is broken, but I don't think it's broken enough to fear an attack by your users(!) and still rule out a rootkit infection (which could alter both the output of stat and md5). Also, just use a better hash, like SHA-2 or somehing. – Sven Apr 13 '10 at 07:39
  • 1
    @SvenW Any hash function can return finite number of different values. So there are always different arguments for a hash function that return the same value, since number of different arguments is infinite. However it's much more harder to find different arguments of the same length (not to say impoxible), that had the same hash function value. – codeholic Apr 13 '10 at 08:43
  • Well, in theory you are right. But practically, creating documents with the same hash and size or even just the same hash is impossible, at the very least for hash functions that are much better than MD5, ie. SHA-2. Again, I wonder what your attack scenario might be? – Sven Apr 13 '10 at 10:42
  • 1
    you could always store more than one hash for you files, why not store both sha and md5 hashes, hacking both maybe very difficult! – The Unix Janitor Apr 13 '10 at 12:49