2

Info on minecraft's region files

Minecraft's region files are stored in 3 sections, the first two giving information about where the chunks are stored, and information about the chunks themselves. In the final section, chunks are given as a 4-byte number length, the type of compression it uses, (almost always is zlib, RFC1950)

Here's more (probably better) information: https://minecraft.gamepedia.com/Region_file_format

The problem

I have a program that successfully loads chunk data. However, I'm not able to find how big the chunks will be when decompressed, and so I just use a maximum amount it could take when allocating space.

In the player data files, they do give the size that it takes when decompressed, and (I think) it uses the same type of compression.

The end of a player.dat file giving the size of the decompressed data (in little-endian):

The end of a player.dat file giving the size of the decompressed data (in little-endian)

This is the start of the chunk data, first 4 bytes giving how many bytes is in the following compressed data:

This is the start of the chunk data, first 4 bytes giving how many bytes is in the following compressed data

Mystery data

However, if I look where the compressed data specifically "ends", there's still a lot of data after it. This data doesn't seem to have a use, but if I try to decompress any of it with the rest of the chunk, I get an error.

Highlighted chunk data, and unhighlighted mystery data:

Highlighted chunk data, and unhighlighted mystery data

Missing decompressed size (header?)

And there's no decompressed size (or header? I could be wrong here) given. The final size of this example chunks is 32,562 bytes, and this number (or any close neighbours) is nowhere to be found within the chunk data or mystery data. (Checked both big-endian, and little-endian)

Decompressed data terminating at index 32562, (Visual Studio locals watch):

Decompressed data terminating at index 32562, (Visual Studio locals watch)

Final Questions

Is there something I'm missing? Is this compression actually different from the player data compression? What's the mystery data? And am I stuck loading in 1<<20 bytes every time I want to load a chunk from a region file?

Thank you for any answers or suggestions

Files used

Isolated chunk data: https://drive.google.com/file/d/1n3Ix8V8DAgR9v0rkUCXMUuW4LJjT1L8B/view?usp=sharing

Full region data: https://drive.google.com/file/d/15aVdyyKazySaw9ZpXATR4dyvhVtrL6ZW/view?usp=sharing

(Not linking player data for possible security reasons)

In the region data, the chunk data starts at index 1208320 (or 0x127000)

Mark Adler
  • 101,978
  • 13
  • 118
  • 158
zandgall
  • 23
  • 4

1 Answers1

1

The format information you linked is quite helpful. Have you read it?

In there it says: "The remainder of the file consists of data for up to 1024 chunks, interspersed with unused space." Furthermore, "Minecraft always pads the last chunk's data to be a multiple-of-4096B in length" (Italics mine.) Everything is in multiples of 4K, so the end of every chunk is padded to the next 4K boundary.

So your "mystery" data is not a mystery at all, as it is entirely expected per the format documentation. That data is simply junk to be ignored.

Note that, from the documentation, that the data "length" in the first three bytes of the chunk is actually one more than the number of bytes of data in the chunk (following the five-byte header).

Also from the documentation, there is indeed no uncompressed size provided in the format.

zlib was designed for streaming data, where you don't know ahead of time how much there will be. You can use inflate() to decompress into whatever buffer size you like. If there's not enough room to finish, you can either do something with that data and then repeat into the same buffer, or you can grow the buffer with realloc() in C, or the equivalent for whatever language you're using. (Not noted in the question or tags.)

Mark Adler
  • 101,978
  • 13
  • 118
  • 158