1

I'm trying to find the byte length of two different files with the following code, but get the byte length as 1, which is obviously wrong. In the long run, I'm trying to compare memory positions of each file and print out where they differ as you'll see. So I wasn't getting anywhere, and did printf statements to see where the problem could be. Therefore, it looks as if my length isn't properly calculating.

Side note that may help with my issue - I found this for memcmp, but does this mean I can't use !=?:

if Return value if < 0 then it indicates str1 is less than str2

if Return value if > 0 then it indicates str2 is less than str1

if Return value if = 0 then it indicates str1 is equal to str2

Help please!

 void compare_two_binary_files(int f1, int f2)
 {
         ssize_t byte_read_f1, byte_read_f2, length, numRead, bob, length2;
         char buf1[BUF_SIZE], buf2[BUF_SIZE], a[100], b[100], counter[100];
         int count = 0, b_pos1, b_pos2;
         while ((byte_read_f1 = read(f1, buf1, sizeof buf1) > 0) && (byte_read_f2 = read(f2, buf2, sizeof buf2) >0)) {
                 length = byte_read_f1;
                 length2 = byte_read_f2;
                 printf("F1 byte length:%o\n", length);
                 printf("F2 byte length:%o\n", length2);
                 ssize_t len =  byte_read_f1 <byte_read_f2 ? byte_read_f1 : byte_read_f2;
                 b_pos1 = memcmp(buf1, buf2, len);
                 printf("Memcmp: %d\n", b_pos1);
                 if (memcmp(buf1, buf2, len) != 0){  // use memcmp for speed
                         ssize_t i;
                         for (i = 0; i<len; i++){
                                 if (buf1[i] != buf2[i]) break;
                         }
 }
Question_Guy
  • 323
  • 2
  • 3
  • 17
  • Is this the actual code ? Only it won't even compile without adding two more closing braces. – Paul R Jan 30 '14 at 22:16

3 Answers3

3

I think there's a mistake in the while condition:

while ((byte_read_f1 = read(f1, buf1, sizeof buf1) > 0) && 
       (byte_read_f2 = read(f2, buf2, sizeof buf2) >0))

where you assign read(f1, buf1, sizeof buf1) > 0 (the result is true = 1) to byte_read_f1 and so for byte_read_f2.

Can this help you?

Update

How comments suggest:

The right form will be:

while (((byte_read_f1 = read(f1, buf1, sizeof buf1)) > 0) && 
       ((byte_read_f2 = read(f2, buf2, sizeof buf2)) >0))
MauriDev
  • 445
  • 2
  • 8
1

> has a higher precedence than =, therefore

if ((byte_read_f1 = read(f1, buf1, sizeof buf1) > 0) && ...)

is equivalent to

if (byte_read_f1 = (read(f1, buf1, sizeof buf1) > 0) && ...)

which assigns 1 to byte_read_f1 if at least one byte was read.

What you want is

if ((byte_read_f1 = read(f1, buf1, sizeof buf1)) > 0 && ...)

If your program reads from other than regular files (such as standard input) then you have also to consider the case that a read() returns less bytes than requested.

For regular files, read() always returns the number of bytes requested, with the only exception at end-of-file of course. Therefore, if the files differ in length, then at some point, read(f1, ...) will return a different amount than read(f2, ...), or one returns zero and the other not.

Martin R
  • 529,903
  • 94
  • 1,240
  • 1,382
  • Where would I adjust the file offsets with `lseek()`? I'm guessing before the while loop? And then do an if statement with `SEEK_CUR` to compare if the two files are at different positions and if so then to back track or whatever? – Question_Guy Jan 30 '14 at 23:44
  • @OSU_2016: I thought about it again. If you compare only *regular files* then the read() always reads as much characters as you requested, with the only exception at the end-of-file of course. - If your program has to handle other file descriptors as well, for example standard input from the terminal, then read() may return less characters. But you cannot seek on those file descriptors. In that case you would have to go with the second suggestion. – Martin R Jan 31 '14 at 06:10
  • My program is to mimick the `cmp` command. I'm only worrying about comparing two files, not standard input. So I'd still need `lseek()` right? – Question_Guy Jan 31 '14 at 21:21
  • @OSU_2016: The cmp command *can* read from standard input. But if you read from regular files only then read() reads as much as possible, so you don't have to seek. – Martin R Jan 31 '14 at 21:26
  • OK. So does this mean I don't need to worry, and just leave it as is? So if two files differ in length, my code will still work? – Question_Guy Jan 31 '14 at 22:46
  • @OSU_2016: If the files differ in length then the *last* read will of course read different amounts from the files. - Why don't you try it yourself? – Martin R Feb 01 '14 at 08:50
  • Obviously I'll try it myself, I was just trying to understand what you meant by "you don't have to seek" – Question_Guy Feb 01 '14 at 17:42
  • @OSU_2016: OK, sorry. I will update the answer later (when I have more time) to make more clear what I meant. – Martin R Feb 01 '14 at 18:09
0

How about fseek()'ing to the end of each file, and querying the file position offset? The fstat() call or something like it might even tell you the size directly.

Phil Perry
  • 2,126
  • 14
  • 18