What is the appropriate checksum algorithm to use for preventing modification of code?

Question

I will explain what I intend to do. I have a function or a set of lines of code(in C) to protect - basically no one should modify the instructions. So, I have a code(called checksum guard) which operates on the x86 assembly code generated from the C file. It picks up one instruction from that assembly, adds(or applies fletcher algo, or any other function) it to a checksum value(initially 0) and this is done till all instructions are finished. Then to see if any instruction has been tampered with, I take the add(or fletcher, etc.) of all instructions and check it against the precomputed checksum value. Which methods are appropriate for this?

Here is the research paper which talks about this technique:

https://www.cerias.purdue.edu/assets/pdf/bibtex_archive/2001-49.pdf

Here is the guard template:

guard:
      add ebp, -checksum
      mov eax, client_addr

for:
      cmp eax, client_end
      jg end
      mov ebx, dword[eax]
      add ebp, ebx
      add eax, 4
      jmp for
end:

Fletcher's checksum is **not** a security checksum (you have "security" as one of your tags). It provides a little protection against random errors, but no protection against malicious attacks because any attacker can modify your data and compute the corresponding checksum for his modification. If you want a security checksum, then message authentication codes (MACs) or digital signatures are required. To answer your question, from a mathematical perspective, I do think 256 is a better choice than 255. — TheGreatContini, May 01 '16 at 22:56
I modified the question and hence the description has changed. I read about Message authentication and it's mentioned that MACs, digital signatures, and authentic encryption are the ways to go about. What do you thing would be the best method ? — white-hawk-73, May 02 '16 at 06:34

score 1 · Answer 1 · answered May 02 '16 at 21:56

The research paper you are citing is from the founders of Arxan technologies. They have patents on protecting code in this way, where protection is about preventing people from reverse engineering. Many years prior to Arxan, Intertrust had some similar technologies. I have not studied Arxan in enough depth to understand what is novel about what they are doing, and I cannot comment on the legalities of the patents.

You originally phrased the question as a security question without giving the context on how it is security. You have now re-written it (thanks!) to make the security context more clear. You are interested in preventing code modification and/or reverse engineering.

Techniques to prevent code modification and/or reverse engineering are based upon obfuscation and self-checking. Security purists will never call obfuscation "security", but in practice it does make a big difference to slow down hackers from reverse engineering software.

Going back to your question, you ask whether one should use a checksum, digital signature, or a MAC for this type of protection. I'd recommend a cryptographic hash function instead. Here's why:

A simple checksum is easy for a hacker to bypass. All he has to do is modify code in such a way that an instruction that never gets called at the end of his code modification cancels out his modifications to the previous part of the code.
Digital signatures and MACs are based upon secrets, and in theory a hacker can always find those secrets in your code. This research paper showed how to do that many years ago (and it is practical and works!). Once the secrets are found, than a digital signature or MAC behaves essentially like a cryptographic hash, so the real reason to avoid these tools is that they are overkill for the problem you are trying to solve.
Cryptographic hashes solve the same problem that checksums do, but they make it impossible for somebody to attack it the same way that one would attack a normal checksum: that is, if they modify the code, then they are also going to have to modify the checksum. In other words, there is no simple way to cancel out modifications of the code by inserting an instruction that does nothing at the end. (If you did find a way, then you would have computed a collision in the hash function, which means you broke the cryptographic hash function).

Despite these points, a single check in the software can still be bypassed in various ways. That's why you need guards to guard the guard, and guards above that, and so on, and you also need to be guarding your checksums (cryptographic hash output values), guarding above that, and so on. In the end, if you want practical security protection from code modifications and reverse engineering, this strategy is heading down a minefield of intellectual property from Arxan and/or Intertrust.

Thank you for your detailed answer. The thing is, as part of my project I have to protect a particular C code, and I know that there can be many 'correct' or sufficiently difficult to break guarding frameworks. But right now to at least make some progress (even if just a little), I wanted to add something to that Arxan paper(which I have mentioned) so instead of using a simple addition on the opcodes, I was looking for another function. I have 2 questions, it would be great if you could answer those. — white-hawk-73, May 03 '16 at 06:21
1. Which cryptographic hash function do you think would be suitable? As the code which I want to protect is not a huge one( 100-200 lines), I don't want the runtime to blowup by adding a complicated hash function. 2. Do you think after adding the checksum functionality(using any of the cryptographic hash function - SHA256, MD5, etc.), I should obfuscate the code as well ? Thanx in advance. — white-hawk-73, May 03 '16 at 06:25
Also, I have to implement this hash function in x86 assembly, as I am picking up assembly instructions from the code to protect one by one, and applying hash function on their opcodes. Is it too difficult to implement this in assembly? — white-hawk-73, May 03 '16 at 13:49
@ak0817: 1. SHA-256 is a good choice, and yes you should obfuscate the code as well (for example, you wouldn't want an attacker to modify your checksum code, so it needs to be hidden). 2. How difficult it is to implement in assembly depends upon how much time and effort you want to spend. I expect that I could do it in one day if I had the assembly language fresh in my mind, though I have a lot of experience with this crypto. Alternatively, you could get the assembly code from somewhere else. Keep in mind a strong hash by itself offers little value, you need to do everything else too! — TheGreatContini, May 03 '16 at 21:55
As long as the result of the checksum or hash is stored somewhere in the program, an attacker can modify both the code and the calculated checksum/hash. — Roland Smith, May 04 '16 at 15:55
@RolandSmith that's right, and that's why Intertrust and Arxan have several levels of checks within the software, i.e. Guards that guard other guards and checksums. We all know that an attacker who spends enough effort can always win, but in practice, they have made it very time consuming to reverse engineer, and have built multi million dollar businesses out of these concepts. — TheGreatContini, May 04 '16 at 18:34
@TheGreatContini What if you skip assembly analysis entirely? For instance, you could run the code on a virtual machine / emulator and observe what it actually *does* as in memory or files/devices read/written to and APIs called. Whatever the obfuscation efforts put into the code, the *effect* of the code should be the same, shouldn't it? Moreover, in such an emulator you could inspect the code *as it it being run*. — Roland Smith, May 04 '16 at 19:04
@RolandSmith: Absolutely. I am aware of people doing such stuff today to decrypt "encrypted strings" in obfuscated code. And that's why checksums are only part of what Arxan offers in their solution (see their website), because they do a lot of other obfuscation stuff as well. In the end, I fully agree that the person reverse engineering is going to win, if he puts enough effort into it. What Arxan is doing is making that person put a lot of effort into it to win. As far as I am aware, nobody has succeeded in breaking an Arxan obfuscated program. — TheGreatContini, May 04 '16 at 21:46

score 1 · Answer 2 · edited May 23 '17 at 12:31

1

There is no real foolproof way to do this in software. Sufficiently motivated and knowledgeable attackers can generally defeat such schemes as long as they have access to the program binary.

The only thing you can do is make it harder for them. Using obfuscation techniques, self modifying code, hardware keys, you name it.

But keep in mind that such tricks in general will to more to annoy legitimate customers that to stop serious attackers.

Update:

For examples, look e.g. at An idiot guide to writing polymorphic engines, the answer to this question and this forum thread. Look for "x86 code obfuscation" and you'll find lots more.

edited May 23 '17 at 12:31

Community

1
1

answered May 02 '16 at 23:19

Roland Smith

42,427
3
64
94

Thank You. Can you suggest a proper obfuscation approach - I mean I read about inserting dummy code, changing the order of instructions, etc. but there does not seem to be an algorithm? – white-hawk-73 May 03 '16 at 06:16
@ak0817 Added some links to obfuscation techniques. – Roland Smith May 03 '16 at 17:24
Thank you. I am just a beginner in obfuscation. I will try to understand whatever I can. – white-hawk-73 May 04 '16 at 08:43

What is the appropriate checksum algorithm to use for preventing modification of code?

2 Answers2