Debug checksum algorithm written in x86 16-bit assembly

Question

I'm currently reverse engineering a software which computes a 2 bytes wide checksum for a given buffer of data. The code comes from a 16-bit DLL (NE format) and it was compiled with Borland C++. I suspect the checksum to be a CRC-16 with a poly of 0x8408 but I had no chance computing an identical CRC so I wonder if the implementation is "CRC16 standard" or not.

Here's the assembly implementation:

crc_cal proc    far

var_4= word ptr -4
arg_0= word ptr  6
arg_2= dword ptr  8

mov ax, seg dseg37
inc bp
push    bp
mov bp, sp
push    ds
mov ds, ax
sub sp, 2
push    si
push    di
xor cx, cx
mov dx, 0FFFFh
mov [bp+var_4], 8408h

loc_42646:
les bx, [bp+arg_2]
add bx, cx
mov al, es:[bx]
xor al, dl
mov dl, al
inc cx
xor di, di
jmp short loc_42672

loc_42657:
mov si, dx
dec si
mov ax, si
shr ax, 1
mov si, ax
mov ax, dx
shr ax, 1
mov dx, ax
cmp si, dx
jnz short loc_42671
mov ax, dx
xor ax, [bp+var_4]
mov dx, ax

loc_42671:
inc di

loc_42672:
cmp di, 8
jb  short loc_42657
cmp cx, [bp+arg_0]
jb  short loc_42646
mov ax, dx
not ax
mov dx, ax
les bx, [bp+arg_2]
add bx, cx
mov es:[bx], dl
inc cx
mov ax, dx
shr ax, 8
mov dx, ax
les bx, [bp+arg_2]
add bx, cx
mov es:[bx], dl
inc cx
pop di
pop si
pop cx
pop ds
pop bp
dec bp
retf
crc_cal endp

And some data with the associated CRC (last two bytes), as computed by the software:

|                           DATA                           |Inc|CRC|
|----------------------------------------------------------|---|---|
00 00 00 00 00 00 01 ef f7 fe ef ff fd ef fb fa fd a2 aa 21 01 f4 e0
00 00 00 00 00 00 01 ef f7 fd ef ff fd fe fb fa fd a2 aa 21 02 f4 d1
00 00 00 00 00 00 01 f7 fe fd fd ff fd df ff fb fd a2 aa 21 03 f4 cd
00 00 00 00 00 00 01 f7 fe fe fd ff f7 ef ff fa fd a2 aa 21 04 f4 c2
00 00 00 00 00 00 01 ef f7 fe ef ff fe ef fb fa fd a2 aa 21 05 f4 db
00 00 00 00 00 00 01 ef f7 fe ef ff fd ef fb fa fd a2 aa 21 06 f4 db

There's no standard CRC-16 implementation, and even for a given polynomial there's several way it can be implemented that will give a different answer for same input. See: https://en.wikipedia.org/wiki/Cyclic_redundancy_check#Specification — Ross Ridge, Nov 25 '19 at 22:50
@Jester I didn't, I have no idea how I can compile this or at least use it in C/C++ program — Spacebrain, Nov 25 '19 at 22:58
So if you haven't tested it how do you know your CRC16 is wrong? — Jester, Nov 25 '19 at 22:59
@RossRidge I wasn't aware of that, thanks. I knew there were different parameters (poly, reflection, XOR) but I thought the algorithm was common to all implementations — Spacebrain, Nov 25 '19 at 23:00
@Jester I have a set of data and associated checksums generated by the software but what I mean by "my CRC is wrong" is that, I can't find the parameters to compute the CRC on my own to make it match the ones generated by the software. — Spacebrain, Nov 25 '19 at 23:02
So, the answer to my question was then YES, you do have some test data with known output ... so how about you give an example so we can see if we can do better? — Jester, Nov 25 '19 at 23:03
The implementation appears to XOR the final result with 0FFFFh with the `not ax` line. I don't know enough about the mathematics CRCs to be sure what bit or byte order its using. — Ross Ridge, Nov 25 '19 at 23:10
@Jester Sorry, I misunderstood your question. I updated my question. — Spacebrain, Nov 25 '19 at 23:18
Hmmm, I have run the code but it does not seem to produce the values you showed. For the first line I get `e6 ef`. Also it's very suspicious that all the checksums have `f4` ... I would expect the CRC to change a lot. Are you sure it's 20 bytes of data + 2 bytes of checksum, directly calculated with this code? Heck, the last two lines have the same checksum even though they differ in a byte? That would be an insane coincidence. — Jester, Nov 25 '19 at 23:27
@Jester I edited the data I've posted, there is an incremental byte between the data and the CRC but I'm unsure whether it's part of the data used for CRC calculation. I also included another procedure which also seems to perform a CRC calculation but it make calls to an unknown functions `Buf::`. I also noticed the first byte of the checksums are always the same, it doesn't make sense to me either... — Spacebrain, Nov 25 '19 at 23:37
The other function seems to be doing the exact same thing, just taking the data from that `Buf` class. Running the code on the data you provided, including or excluding the `inc` field, does not produce the shown checksum values. While we could work out what CRC the functions implement if they themselves don't produce the required output, that would be a pointless exercise. — Jester, Nov 25 '19 at 23:42
@Jester It's frustrating, I really don't see how the checksum compute with the above function can be different. By chance, could you tell me how did you run the first function to test with the data I've posted ? This way, if I can grab more data I could test it. — Spacebrain, Nov 25 '19 at 23:47
I assembled it and ran it in dosbox. It takes the data size and a far pointer to the bytes as argument. It puts the checksum at the end, as in your example. — Jester, Nov 25 '19 at 23:53
You removed the version that required 386 for `eax`. The sentence I added in my edit is now wrong; there is no `movzx`. It still won't run on 8086, `shr ax, 8` requires 186 IIRC. — Peter Cordes, Nov 26 '19 at 09:47
@PeterCordes I removed the second function because from what I've seen and understood, the checksum calculation was identical to the first function. So it became a bit redundant. — Spacebrain, Nov 26 '19 at 10:10
Ok, and I'm reminding you to update the rest of you question to match the current state. — Peter Cordes, Nov 26 '19 at 10:12

rcgldr · Accepted Answer · 2019-11-27T02:02:06.670

The data shown doesn't correspond to a crc, as noted in this prior answer:

Find used CRC-16 algorithm

The code is an overly complex implementation of a right shifting CRC (in dx), poly = 0x8408, initial value = 0xffff, xor out = 0xffff. Check the next 2 bytes after each line to see if that is where the CRC is appended.

Questions code with comments. Thanks to Ross Ridge for explaining the "inc bp" is used to indicate a far call was involved, in case the stack needs to be backwalked (the "dec bp" at the end is used to undo the "inc bp" at the start).

crc_cal proc    far

var_4   =       word ptr -4     ; used to store poly
arg_0   =       word ptr  6     ; number of bytes of data
arg_2   =       dword ptr 8     ; far pointer to data

        mov     ax, seg dseg37  ; for ds that is never used
        inc     bp              ; bp += 1, (bp&1 == far call indicator)
        push    bp              ; save bp+1
        mov     bp, sp          ; bp = sp, base for the equated offsets
        push    ds              ; save ds
        mov     ds, ax          ; ds = dseg37  (never used)
        sub     sp, 2           ; allocate space for poly (var_4)
        push    si              ; save si, di
        push    di
        xor     cx, cx          ; cx = offset to data
        mov     dx, 0FFFFh      ; dx = initial crc
        mov     [bp+var_4], 8408h ;store poly

loc_42646:
        les     bx, [bp+arg_2]  ; al = next byte of data
        add     bx, cx
        mov     al, es:[bx]
        xor     al, dl          ; crclo ^= data
        mov     dl, al
        inc     cx              ; increment offset to data
        xor     di, di          ; di = bit counter (0 to 7)
        jmp     short loc_42672

loc_42657:
        mov     si, dx          ; si = (crc-1)>>1
        dec     si              ;  if lsb was 0, then
        mov     ax, si          ;  si != dx later on
        shr     ax, 1
        mov     si, ax
        mov     ax, dx          ; dx = (crc)>>1
        shr     ax, 1
        mov     dx, ax
        cmp     si, dx          ; br if prior lsb of crc was 0
        jnz     short loc_42671
        mov     ax, dx          ; crc ^= 0x8408
        xor     ax, [bp+var_4]
        mov     dx, ax

loc_42671:
        inc     di              ; increment bit counter

loc_42672:
        cmp     di, 8           ; loop till byte done
        jb      short loc_42657
        cmp     cx, [bp+arg_0]  ; loop till all bytes done
        jb      short loc_42646
        mov     ax, dx          ; dx = ~ crc
        not     ax
        mov     dx, ax
        les     bx, [bp+arg_2]  ; append crc to data, lsbyte first
        add     bx, cx
        mov     es:[bx], dl
        inc     cx
        mov     ax, dx
        shr     ax, 8
        mov     dx, ax
        les     bx, [bp+arg_2]
        add     bx, cx
        mov     es:[bx], dl
        inc     cx              ; useless, cx gets overwritten below
        pop     di              ; restore di, si
        pop     si
        pop     cx              ; cx = poly
        pop     ds              ; restore ds, bp
        pop     bp
        dec     bp              ; bp -= 1 (undo inc bp from above)
        retf
crc_cal endp

It's overly complex because its compiled C code and I guess the table-based version was considered too big. — Ross Ridge, Nov 26 '19 at 20:42
@RossRidge - it's it's compiled C code, it's still overly complex, and not the typical code you see such as `crc = (crc&1)? (crc>>1)^poly : (crc>>1); ` , instead it's something like `tmp = (crc-1)>>1;` | `crc = (crc >>1);` | `if (tmp == crc) crc = crc^poly;` . — rcgldr, Nov 26 '19 at 22:10
If you were wondering what incrementing and decrementing of BP is for: https://devblogs.microsoft.com/oldnewthing/20110316-00/?p=11203 — Ross Ridge, Nov 27 '19 at 01:08

Debug checksum algorithm written in x86 16-bit assembly

1 Answers1