0

I have code that reads from a bzip2-compressed file using the zlib compatibility functions. This works in principle but it turns out that reading stops after exactly 900,000 bytes, which is the block size used during compression. How do I read past the block boundary, into the next block, using these functions?

Here is some very basic test code (error handling removed):

BZFILE *h = BZ2_bzopen("file.bz2", "rb");
while( auto n = BZ2_bzread(h, buf, 1024) ) {
  printf("%d bytes read\n",n);
  ntot += n;
}

BZ2_bzclose(h);
printf("%ld bytes read\n",ntot);
Shawn
  • 47,241
  • 3
  • 26
  • 60
user52366
  • 1,035
  • 1
  • 10
  • 21
  • While some snippet of code could be valid in multiple languages, and a problem be solvable using multiple languages, please only tag the language you're actually programming in. – Some programmer dude Apr 30 '19 at 11:47
  • 1
    Does `h->lastErr` contain a useful error code? – Botje Apr 30 '19 at 11:57
  • 1
    Do you do any error reporting using the `BZ2_bzerror()` function mentioned in that documentation? – Shawn Apr 30 '19 at 12:02
  • 1
    Also, is `bunzip2` happy with your file? For all we know your file was truncated after one block... – Botje Apr 30 '19 at 12:04
  • Yes, the file is valid and there is no error. The compatibility functions just reads exactly one block. Looking at the source of bzip2, e.g. here https://github.com/enthought/bzip2-1.0.6/blob/master/bzip2.c from line 452 onward, it seems the blocks are read separately and the "unused" bytes of the last block must be copied to the next block, as described in the docs. If I am right, the zlib compatibility functions simply do not handle this case (see https://github.com/enthought/bzip2-1.0.6/blob/master/bzlib.c from line 1478 onward). This would mean they are unusable for larger files, I think. – user52366 Apr 30 '19 at 16:15

0 Answers0