0

I am working on a data generator in order to generate binary files into a closed system for which we assume there is no risk of external attacks nor any malicious intent.

The idea is to find a way for these binaries to contain a quick and relatively reliable way to ensure they have not been accidentally corrupted in the toolchain so the final receiver of this binary can check the file integrity without any additional resource (such as file.sha1 containing the checksum).

As it may affect the solution, I must tell that the binary file can be 1 kB up to maybe 300 MB of data. There is a specific position in the file where I can place a checksum of any fixed length for all files. The position is already defined I can not change it but I can change the length.

So if the solution is to include 128 bytes of data at this position in order to suit any possible case then all binaries will contain this byte length at this position.

As it would be impossible to include a cryptographic hash of the file into it without the need to cut it off before checking, I read that CRC32 is a good way to achieve this goal. I have also took knowledge of utilities such as "spoof" or "CRC manipulator" but they seem not to satisfy my case.

Here is a quick example of what I need. Let's consider a binary file:

               This position will never change
               v
1011101100010010000000010110011100100110
               ^^^^^^^^
               This is the fixed-length part dedicated to file integrity check

I would like to find a way to insert the right checksum so the complete file including the checksum has the overall same checksum. Maybe a known program already does this ?

Thanks for your support

someone
  • 1
  • 1

2 Answers2

0

You can still use a cryptographic hash, just copy out the hash and then zero out the hash block before checking the hash. Likewise, you'll need to ensure the hash block is all zeros before hashing the file.

You can use dd in a shell script to copy data into a specific byte position in a file. Or you can use standard random-access file I/O in any good programming language.

However, I would question whether this complexity is really necessary. The more standard solution would be to put the hash separately (either in a separate file, or at the very start or end of the file) which avoids this problem entirely. Putting the hash at the end of the file might make it slightly more inconvenient to read the file, depending on the file format, but putting the hash at the start of the file shouldn't have that problem.

Robin Green
  • 32,079
  • 16
  • 104
  • 187
0

You need to compute the CRC forwards and backwards to the insertion point, and put the exclusive-or of those two there. Then the CRC of the whole thing will be a constant. (Assuming no corruption.)

Here is example code:

// Example of the generation of a "middle" CRC, which is inserted somewhere in
// the middle of a sequence, where the CRC is generated such that the CRC of
// the complete sequence will be zero. This particular CRC has no pre or post
// processing.
//
// Placed into the public domain by Mark Adler, 11 May 2016.

#include <stddef.h>         // for size_t
#include <stdint.h>         // for uint32_t and uint64_t

#define POLY 0xedb88320     // CRC polynomial

// Byte-wise CRC tables for forward and reverse calculations.
uint32_t crc_forward_table[256];
uint32_t crc_reverse_table[256];

// Fill in CRC tables using bit-wise calculations.
void crc32_make_tables(void) {
    for (uint32_t n = 0; n < 256; n++) {
        uint32_t crc = n;
        for (int k = 0; k < 8; k++)
            crc = crc & 1 ? (crc >> 1) ^ POLY : crc >> 1;
        crc_forward_table[n] = crc;
        crc_reverse_table[crc >> 24] = (crc << 8) ^ n;
    }
}

// Return the forward CRC of buf[0..len-1], starting with crc at the front.
uint32_t crc32(uint32_t crc, unsigned char *buf, size_t len) {
    for (size_t n = 0; n < len; n++)
        crc = (crc >> 8) ^ crc_forward_table[(crc ^ buf[n]) & 0xff];
    return crc;
}

// Return the reverse CRC of buf[0..len-1], starting with crc at the end.
uint32_t crc32_reverse(uint32_t crc, unsigned char *buf, size_t len) {
    while (len)
        crc = (crc << 8) ^ crc_reverse_table[crc >> 24] ^ buf[--len];
    return crc;
}

// Put a 32-bit value into a byte buffer in little-endian order.
void put4(uint32_t word, unsigned char *pos) {
    pos[0] = word;
    pos[1] = word >> 8;
    pos[2] = word >> 16;
    pos[3] = word >> 24;
}

#include <stdlib.h>         // for random() and srandomdev()

// Fill dat[0..len-1] with uniformly random byte values. All of the bits from
// each random() call are used, except for possibly a few leftover at the end.
void ranfill(unsigned char *dat, size_t len) {
    uint64_t ran = 1;
    while (len) {
        if (ran < 0x100)
            ran = (ran << 31) + random();
        *dat++ = ran;
        ran >>= 8;
        len--;
    }
}

#include <stdio.h>          // for printf()

#define LEN 1024            // length of the message without the CRC

// Demonstrate the generation of a middle-CRC, using the forward and reverse
// CRC computations. Verify that the CRC of the resulting sequence is zero.
int main(void) {
    crc32_make_tables();
    srandomdev();
    unsigned char dat[LEN+4];
    ranfill(dat, LEN/2);
    put4(0, dat + LEN/2);       // put zeros where the CRC will go
    ranfill(dat + LEN/2 + 4, (LEN+1)/2);
    put4(crc32(0, dat, LEN/2) ^ crc32_reverse(0, dat + LEN/2, (LEN+1)/2 + 4),
         dat + LEN/2);          // replace the zeros with the CRC
    printf("%08x\n", crc32(0, dat, LEN+4));
    return 0;
}
Mark Adler
  • 101,978
  • 13
  • 118
  • 158