1

I am currently trying to extract the inital RAM filesystem that is embedded in a kernel image, modify it, and repack the kernel image with it. You can search GitHub and find several projects that do this here. My problem with those scripts is that they are very specific or turn out to hard code values that I cannot use with the kernel image I am working with or they are just hard to reverse engineer. The kernel image I am working with uses XZ to compress the kernel and GZip to compress the initial RAM filesystem CPIO archive. The end of a GZip stream is easy to locate by looking for a particular string and those GitHub projects seem to need to know the exact end of these streams.

I am able to extract the compressed portion of the kernel like this:

offset=$(cat zImage | grep -aob $'\xFD\x37\x7A\x58\x5A\x00' | cut -d ":" -f 1 | sed -n 2p)
dd bs=1 skip=$offset if=zImage | xzcat > Image

I can then extract the initial RAM filesystem CPIO archive from Image like this:

offset=$(cat "${1}" | grep -aob $'\x1F\x8B\x08' | cut -d ":" -f 1 | sed -n 2p)
dd bs=1 skip=$offset if=Image | zcat > initramfs.cpio

Now, it seems from reading those GitHub scripts that one must know the end address of the compressed streams. How do I find the end of an XZ stream in shell script?

Melab
  • 2,594
  • 7
  • 30
  • 51

1 Answers1

2

The size of an xz file or stream is encoded within the Unpadded Size field within the xz header.
The Unpadded size field indicates the size of the Block excluding the Block Padding field.

i.e. Unpadded Size = size-of( Block Header + Compressed Data + CRC fields)


One quick method to obtain the size is using the xz tool itself as follows:

xz --robot --list <xz-file-or-stream> | cut -f 4 | sed -n 2p

Reference: The xz file-format.


Note: Unpadded Size is stored using the encoding described in Section 1.2 Multibyte Integers of the xz file-format specification. The value MUST never be zero; with the current structure of Blocks, the actual minimum value for Unpadded Size field is five.

Implementation note: Because the size of the Block Padding field is not included in Unpadded Size, calculating the total size of a Stream or doing random-access reading requires calculating the actual size of the Blocks by rounding Unpadded Sizes up to the next multiple of four.

TheCodeArtist
  • 21,479
  • 4
  • 69
  • 130