1

The program works correctly in Linux, but I get extra characters after the end of file when running in Windows or through Wine. Not garbage but repeated text that was already written. The issue persists whether I write to stdout or a file, but doesn't occur with small files, a few hundred KB is needed.

I nailed down the issue to this function:

 static unsigned long read_file(const char *filename, const char **output)
{
    struct          stat file_stats;
    int             fdescriptor;
    unsigned long   file_sz;
    static char     *file;

    fdescriptor = open(filename, O_RDONLY);
    if (fdescriptor < 0 || (fstat(fdescriptor ,&file_stats) < 0))
    {   printf("Error opening file: %s \n", filename);
        return (0);
    }
    if (file_stats.st_size < 0)
    {   printf("file %s reports an Incorrect size", filename);
        return (0);
    }
    file_sz = (unsigned long)file_stats.st_size;
    file = malloc((file_sz) * sizeof(*file));
    if (!file)
    {   printf("Error allocating memory for file %s of size %lu\n", filename, file_sz);
        return (0);
    }
    read(fdescriptor, file, file_sz);
    *output = file;
    write(STDOUT_FILENO, file, file_sz), exit(1); //this statement added for debugging.
    return (file_sz);
}

I can't debug through Wine, much less in windows, but by using printf statements I can tell the file size is correct. The issue is either in the reading or the writing and without a debugger I can't look at the contents of the buffer in memory.

The program was compiled with x86_64-w64-mingw32-gcc, version 8.3. which is the same version of gcc in my system.

At this point I'm just perplexed; I would love to hear any ideas you may have.

Thank you.

Edit: The issue was that fewer bytes were being read than the reported file size and I was writing more than necessary. Thanks to Matt for telling me where to look.

  • 1
    Thank you, I have the terrible habit of never checking their return values unless I need them for something, which will bite me in the if they ever fail, like in this case. 639439 bytes are read, but I write 656830. That is the issue right there. – GlorifiedBum May 13 '20 at 07:06
  • 1
    I'm pretty sure that `read` hasn't actually failed here, it's just that the size reported by `fstat` isn't necessarily equal to the number of bytes you can actually read from a file. You can see that on linux too, by the way: just try to read some of the text "files" in `/proc` or `/sys` and you'll find that the size reported by `fstat` will often be too big. – Felix G May 13 '20 at 07:20
  • 2
    Welcome to [so]! If you found your solution, please post it as an answer (below), don’t put it into the question. https://stackoverflow.com/help/self-answer – Melebius May 13 '20 at 07:33
  • suggest opening the file in binary mode too – M.M May 13 '20 at 08:11
  • @M.M I must confess that your comment perplexed me. I thought open already worked in binary mode. I thought I had a vague notion of what "binary" mode was, but I was wrong. This is the best explanation I've found: https://www.cygwin.com/cygwin-ug-net/using-textbinary.html So in Linux there is no binary mode, but in Windows I need to be mindful if I don't use it. Of course I have no way to do this from Liinux unless I add special cases. I'll add this to the list of incompatibilities I've found so far. So much for "portable code". – GlorifiedBum May 13 '20 at 12:43
  • 1
    There should be `O_BINARY` open mode flag in a compiler targeting Windows – M.M May 13 '20 at 21:51
  • You are right, the compiler will take it. I added some processor directives and it works now for both targets. Thanks. – GlorifiedBum May 16 '20 at 03:49

1 Answers1

2

Read can return a size different than that reported by fstat. I was writing the reported file size instead of the actual number of bytes read, which led to the issue. If writing, one should use the number of bytes directly reported by read to avoid this.

It is always best to both check the return value of read/write for failure and to make sure all bytes have been read as read can return less bytes than the total when reading from a pipe or interrupted by a signal, in which case multiple calls are necessary.

Thanks to Mat and Felix for the answer.