Take your initial 1024-byte block A and your new 1024-byte block B. Exclusive-or them to get block C. Since you only changed four bytes, C will be bunch of zeros, four bytes which are the exclusive-or of the previous and new four bytes, and a bunch more zeros.
Now compute the CRC-16 of block C, but without any pre or post-processing. We will call that CRC-16'. (I would need to see the specific CRC-16 you're using to see what that processing is, if anything.) Exclusive-or the CRC-16 of block A with the CRC-16' of block C, and you now have the CRC-16 of block B.
At first glance, this may not seem like much of a gain compared to just computing the CRC of block B. However there are tricks to rapidly computing the CRC of a bunch of zeros. First off, the zeros preceding the four bytes that were changed give a CRC-16' of zero, regardless of how many zeros there are. So you just start computing the CRC-16' with the exclusive-or of the previous and new four bytes.
Now you continue to compute the CRC-16' on the remaining n zeros after the changed bytes. Normally it takes O(n) time to compute a CRC on n bytes. However if you know that they are all zeros (or all some constant value), then it can be computed in O(log n) time. You can see an example of how this is done in zlib's crc32_combine()
routine, and apply that to your CRC.
Given your CRC-16/DNP parameters, the zeros()
routine below will apply the requested number of zero bytes to the CRC in O(log n) time.
// Return a(x) multiplied by b(x) modulo p(x), where p(x) is the CRC
// polynomial, reflected. For speed, this requires that a not be zero.
uint16_t multmodp(uint16_t a, uint16_t b) {
uint16_t m = (uint16_t)1 << 15;
uint16_t p = 0;
for (;;) {
if (a & m) {
p ^= b;
if ((a & (m - 1)) == 0)
break;
}
m >>= 1;
b = b & 1 ? (b >> 1) ^ 0xa6bc : b >> 1;
}
return p;
}
// Table of x^2^n modulo p(x).
uint16_t const x2n_table[] = {
0x4000, 0x2000, 0x0800, 0x0080, 0xa6bc, 0x55a7, 0xfc4f, 0x1f78,
0xa31f, 0x78c1, 0xbe76, 0xac8f, 0xb26b, 0x3370, 0xb090
};
// Return x^(n*2^k) modulo p(x).
uint16_t x2nmodp(size_t n, unsigned k) {
k %= 15;
uint16_t p = (uint16_t)1 << 15;
for (;;) {
if (n & 1)
p = multmodp(x2n_table[k], p);
n >>= 1;
if (n == 0)
break;
if (++k == 15)
k = 0;
}
return p;
}
// Apply n zero bytes to crc.
uint16_t zeros(uint16_t crc, size_t n) {
return multmodp(x2nmodp(n, 3), crc);
}