0

For a homework assignment I created a simple compression/decompression program that makes use of a naive implementation of run-length encoding. I've gotten my program working; compressing and decompressing any text file with a pretty large number of characters (e.g. the program source) works flawlessly. As an experiment I tried to compress/decompress the binary of the compression program itself. This resulted in a file that was much smaller than the original binary, and is obviously un-runnable. What is causing this data-loss?

My assumption was that it's related to how binary files are represented, but I can't figure much out past that.

grimetime
  • 390
  • 1
  • 4
  • 12
  • 2
    Are you opening the file in binary mode? – NPE Apr 01 '13 at 06:54
  • Binary files are just an unformatted stream of 1's and 0's – Prashant Shilimkar Apr 01 '13 at 06:55
  • @NPE No, my program just reads in from the input using `getchar()`. Isn't this just grabbing bits from the file 8 at at time and returning the integer value of them? – grimetime Apr 01 '13 at 07:01
  • Please show us your code. – NPE Apr 01 '13 at 07:02
  • I'm not going to be able to do that right away, the program assignment due date is still a few days from now. – grimetime Apr 01 '13 at 07:03
  • @grimetime: if you don't open the file/stream as binary, then on some platforms reading the file will transform line-endings to map them to a `'\n'` character (even for `getchar()`). Also, some platforms will treat a particular control character as an EOF (Windows does this when it encounters a Ctrl-Z if the file is opened in text mode). However, on Linux you will not run into these problems, but you should still open the files in binary mode in case the program is ever built for Windows. – Michael Burr Apr 01 '13 at 07:06
  • 1
    "I'm not going to be able to do that right away" -- Then why even bother to post your question? Without the code, all anyone can do is guess what the cause is. – Jim Balter Apr 01 '13 at 07:08
  • Read http://en.wikipedia.org/wiki/Executable_and_Linkable_Format to understand what is a binary executable on Linux – Basile Starynkevitch Apr 01 '13 at 07:31

2 Answers2

3

Possible issues:

  • Your program opens the binary file in the text mode, which damages the '\r' and '\n' bytes
  • Your program incorrectly handles zero bytes, treating them as ends of strings ('\0') and not as data of its own
  • Your program uses char (that is actually signed char) for the bytes of data and correctly works only with non-negative values, which ASCII chars of English text are, but fails to work with arbitrary char/byte values, which may be negative
  • Your program has an overflow somewhere which shows up only on big files
  • Your program has some other data-dependent bug
Alexey Frunze
  • 61,140
  • 12
  • 83
  • 180
1

If the platform is linux (as the question is tagged), there's no difference between binary and text modes. So it shouldn't be that; but even so, the files should be opened as binary.

I suspect that your problem is the program treats '\0' characters as terminators (or otherwise specially) instead of as valid data.

Michael Burr
  • 333,147
  • 50
  • 533
  • 760