How effective is obfuscation?

Question

A different question, i.e. Best .NET obfuscation tools/strategy, asks whether obfuscation is easy to implement using tools.

My question though is, is obfuscation effective? In a comment replying to this answer, someone said that "if you're worried about source theft ... obfuscation is almost trivial to a real cracker".

I've looked at the output from the Community Edition of Dotfuscator: and it looks obfuscated to me! I wouldn't want to maintain that!

I understand that simply 'cracking' obfuscated software might be relatively easy: because you only need to find whichever location in the software implements whatever it is you want to crack (typically the license protection), and add a jump to skip that.

If the worry is more than just cracking by an end-user or a 'pirate' though: if the worry is "source theft" i.e. if you're a software vendor, and your worry is another vendor (a potential competitor) reverse-engineering your source, which they could then use in or add to their own product ... to what extent is simple obfuscation an adequate or inadequate protection against that risk?

1st edit:

The code in question is about 20 KLOC which runs on end-user machines (a user control, not a remote service).

If obfuscation really is "almost trivial to a real cracker", I'd like some insight into why it's ineffective (and not just "how much" it's not effective).

2nd edit:

I'm not worried about someone's reversing the algorithm: more worried about their repurposing the actual implementation of the algorithm (i.e. the source code) into their own product.

Figuring that 20 KLOC is several month's work to develop, would it take more or less than this (several months) to deobfuscate it all?

Is it even necessary to deobfuscate something in order to 'steal' it: or might a sane competitor simply incorporate it wholesale into their product while still obfuscated, accept that as-is it's a maintenance nightmare, and hope that it needs little maintenance? If this scenario is a possibility then is obfuscated .Net code any more vulnerable to this than compiled machine code is?

Is most of the obfuscation "arms race" aimed mostly at preventing people people from even 'cracking' something (e.g. finding and deleting the code fragment which implements licensing protection/enforcement), more than at preventing 'source theft'?

Indeed, unless you have something like Javascript, your source code, as such no longer exists in the program, just the instruction set generated by the code. Your source is the blue-print but the final product doesn't include the blueprint. — Robert Gould, Feb 16 '09 at 02:40
So the problem is like will putting curtains over my building stop people from stealing my blueprints? ... there is no connection between both in this case. Javascript and some languages process the code as is or use byte-code line per line like Python or Lua, but those are not compiled languages. — Robert Gould, Feb 16 '09 at 02:42
Robert, using .Net (e.g. C#) your source code *is* still in the program (minus only any comments): it's eye-opening to run http://www.red-gate.com/products/reflector/ on a compiled program. — ChrisW, Feb 16 '09 at 03:05
But that just shows classes and functions (if I'm correct), not the implementation, although you can get a call stack anyways. — Robert Gould, Feb 16 '09 at 03:07
Reflector isn't only able to show the declarations of the classes and methods: it can also show the implementation (body) of each method too. — ChrisW, Feb 16 '09 at 03:13
Well I can see the same stepping through a debugger anyways. And using call graph programs, you can get the same kind of info from x86 assembly, Red-gate just makes it prettier. There is no way you can make something unreadable to the machine, because then it wouldn't run. — Robert Gould, Feb 16 '09 at 03:16
The whole point is, is your algorithm really that unique? If it is, and I hate to say this, you patent it, just like Microsoft, Fmod or GIF people. The legal route is the only way you have to protect that kind of stuff. — Robert Gould, Feb 16 '09 at 03:19
With a native debugger you see un-named machine addresses, not symbols. With Reflector, you see source code. With obfustication, you see obfusticated source code. Someone said that obfustication is 'trivial', another that obfustication makes it an order of magnitude harder than writing from scratch. — ChrisW, Feb 16 '09 at 03:23
Ok let me make an assumption here, you have 20KLOC, but I bet that the real important stuff is 2KLOC or less (<10%), the other 18K are trivial semi-boiler-plate stuff not worth stealing at all. The problem is when you are talking about 2KLOC trivial and magnitude aren't that different anymore. — Robert Gould, Feb 16 '09 at 03:31
Now theoretically Obfuscation will make it harder to find those 2KLOC, but in reality, programs have output, and work in a responsive manner, so with a little human ingenuity it shouldn't be hard to reduce the 20K search space to 5KLOC, and finding 2K in 5K is trivial. — Robert Gould, Feb 16 '09 at 03:33
That's an unjustified assumption. People have talked about obfuscation (why, how it works, and how people crack it), but I don't know *how effective* it is: e.g. is it "trivial", or is it "an order of magnitude harder than to rewrite from scratch? Is there any publicly-available quantified evidence? — ChrisW, Feb 16 '09 at 03:39
Ok a competent cracker can reverse engineer your system in a day, no matter the obfuscation (because they have tools for that already, like ripping DVDs). If you have your unique obfuscation system, then it'll take them the time it takes to write the tool. Normally a few days to a few weeks. — Robert Gould, Feb 16 '09 at 05:33
Now if the cracker, estimates that the returns of writing the custom tool is worth it, say someone is paying them, or they personally will greatly benefit of your code, instead of some using some other open-source, or cracked-source solution, then its a week of work. — Robert Gould, Feb 16 '09 at 05:36

score 44 · Accepted Answer · edited Jul 15 '19 at 08:27

44

I've discussed why I don't think Obfuscation is an effective means of protection against cracking here:
Protect .NET Code from reverse engineering

However, your question is specifically about source theft, which is an interesting topic. In Eldad Eiliams book, "Reversing: Secrets of Reverse Engineering", the author discusses source theft as one reason behind reverse engineering in the first two chapters.

Basically, what it comes down to is the only chance you have of being targeted for source theft is if you have some very specific, hard to engineer, algorithm related to your domain that gives you a leg up on your competition. This is just about the only time it would be cost-effective to attempt to reverse engineer a small portion of your application.

So, unless you have some top-secret algorithm you don't want your competition to have, you don't need to worry about source theft. The cost involved with reversing any significant amount of source-code out of your application quickly exceeds the cost of re-writing it from scratch.

Even if you do have some algorithm you don't want them to have, there isn't much you can do to stop determined and skilled individuals from getting it anyway (if the application is executing on their machine).

Some common anti-reversing measures are:

Obfuscating - Doesn't do much in terms of protecting your source or preventing it from being cracked. But we might as well not make it totally easy, right?
3rd Party Packers - Themida is one of the better ones. Packs an executable into an encrypted win32 application. Prevents reflection if the application is a .NET app as well.
Custom Packers - Sometimes writing your own packer if you have the skill to do so is effective because there is very little information in the cracking scene about how to unpack your application. This can stop inexperienced RE's. This tutorial gives some good information on writing your own packer.
Keep industry secret algorithms off the users machine. Execute them as a remote service so the instructions are never executed locally. The only "fool-proof" method of protection.

However, packers can be unpacked, and obfuscation doesn't really hinder those who want to see what you application is doing. If the program is run on the users machine then it is vulnerable.

Eventually its code must be executed as machine code and it is normally a matter of firing up debugger, setting a few breakpoints and monitoring the instructions being executed during the relevant action and some time spent poring over this data.

You mentioned that it took you several months to write ~20kLOC for your application. It would take almost an order of magnitude longer to reverse those equivalent 20kLOC from your application into workable source if you took the bare minimum precautions.

This is why it is only cost-effective to reverse small, industry specific algorithms from your application. Anything else and it isn't worth it.

Take the following fictionalized example: Lets say I just developed a brand new competing application for iTunes that had a ton of bells and whistles. Let say it took several 100k LOC and 2 years to develop. One key feature I have is a new way of serving up music to you based off your music-listening taste.

Apple (being the pirates they are) gets wind of this and decides they really like your music suggest feature so they decide to reverse it. They will then hone-in on only that algorithm and the reverse engineers will eventually come up with a workable algorithm that serves up the equivalent suggestions given the same data. Then they implement said algorithm in their own application, call it "Genius" and make their next 10 trillion dollars.

That is how source theft goes down.

No one would sit there and reverse all 100k LOC to steal significant chunks of your compiled application. It would simply be too costly and too time consuming. About 90% of the time they would be reversing boring, non-industry-secretive code that simply handled button presses or handled user input. Instead, they could hire developers of their own to re-write most of it from scratch for less money and simply reverse the important algorithms that are difficult to engineer and that give you an edge (ie, music suggest feature).

edited Jul 15 '19 at 08:27

Nurbol Alpysbayev

19,522
3
54
89

answered Feb 16 '09 at 01:26

mmcdole

91,488
60
186
222

Your statement, "The cost involved with reversing your application quickly exceeds the cost of writing it from scratch" seems to be rather the opposite of the statement I cited in the OP, i.e. "if you're worried about source theft ... obfuscation is almost trivial to a real cracker". – ChrisW Feb 16 '09 at 01:31
1

@ChrisW, not at all. Source Theft != Cracking. Reversing an application to the point of being able to crack it is no where near the same thing as reversing a significant chunk of source code from it. And obfuscation isn't what makes the reversing of any sizeable portion of an app difficult. – mmcdole Feb 16 '09 at 01:39
I'm not worried about someone's reversing the algorithm ... more worried about their repurposing the actual implementation of the algorithm (i.e. the source code) into their own product. Figuring that 20 KLOC is several month's work to develop, would it take more or less than several months to ... – ChrisW Feb 16 '09 at 02:16
... deobfuscate? Is it even necessary to deobfuscate something in order to 'steal' it, or might a sane competitor just incorporate it wholesale while still obfuscated, accept that it's a maintenance nightmare and hope that it needs little maintenance? Is most of the obfuscation "arms race" aimed ... – ChrisW Feb 16 '09 at 02:19
... at preventing people people from even 'cracking' something (e.g. finding and deleting licensing), more than at preventing 'source theft'? – ChrisW Feb 16 '09 at 02:21
@ChrisW, The cost of source theft is greater than the cost of developing it form scratch is almost all cases except the ones I listed above. Most of the obfuscation and protection is used to prevent people from pirating your applications and we all know how well that goes down. – mmcdole Feb 16 '09 at 02:38
What evidence is there for saying that "The cost of source theft is greater than the cost of developing it form scratch" and "It would take almost an order of magnitude longer to reverse those equivalent 20kLOC from your application into workable source if you took the bare minimum precautions"? – ChrisW Feb 16 '09 at 02:47
And, to be specific, might taking "the bare minimum precautions" mean 'using the Community Edition of Dotfuscator', or are the minimum precautions higher / more rigorous than this? – ChrisW Feb 16 '09 at 02:49
@ChrisW, if I honestly was worried then I would use any obfuscator (Dotfuscator is fine) ~and~ I would use a 3rd party packer. Since you are using .NET this has the added advantage of preventing people from using reflector on your application and it can be troublesome to unpack. – mmcdole Feb 16 '09 at 02:53
Your source code, as such, no longer exists in the executable, just the instruction set generated by the code. It is a blue-print, not in the final product. Its like will putting curtains over my building stop people from stealing my blueprints? ... there is no connection between both in this cases – Robert Gould Feb 16 '09 at 02:53
@Robert Gould, I'm not sure what you are saying. Nowhere I am suggesting that your source code still exists in it's original form. That doesn't mean algorithms can be derived from your machine instructions, and in that sense, source-theft. – mmcdole Feb 16 '09 at 02:56
1

@Simucal, no you don't imply it at all, but I think ChrisW might believe the source is somehow obtainable from the executable, when its not. He seems more concerned about his line count than his algorithms. So I think he has a misconception that doesn't allow him to understand the problem correctly. – Robert Gould Feb 16 '09 at 03:05
@Robert Gould, Ahh, I see what you are saying now. Yes, I agree. – mmcdole Feb 16 '09 at 03:12
In the case of .Net (e.g. C#) the source code (including bodies of each method, symbolic names of local variables, everything except comments) can be retrieved from the intermediate language to which it's "compiled": i.e. a 1-to-1 reversible mapping to the source *is* in the executable. – ChrisW Feb 16 '09 at 03:51
@ChrisW, the .NET Compiler changes many things and it is never possible to go from compiled MSIL to what the developers source is. You can get relatively close, but even just deciphering the compilers "optimizations" can be tricky. If you don't wan't reflecting then use packers. – mmcdole Feb 16 '09 at 04:31
Thanks for the reference to the book. – ChrisW Feb 19 '09 at 10:57
I love the Apple example. I'll respond with an Italian Job reference. "Are you the real Napster?" – samoz Jun 10 '09 at 11:48
[This](http://meta.stackoverflow.com/q/332125/1835769) is the reason behind sudden upvotes. – displayName Aug 10 '16 at 19:08
10 years old - but still valid as it stands! Software piracy is a true innovation killer. – user492238 Oct 01 '19 at 07:04

wuputah · Answer 2 · 2009-02-16T00:50:55.317

10

Obfuscation is a form of security through obscurity, and while it provides some protection, the security is obviously quite limited.

For the purposes you describe, obscurity can certainly help, and in many cases, is an adequate protection against the risk of code theft. However, there is certainly still a risk that the code will be "unobfuscated" given sufficient time and effort. Unobfuscating the entire codebase would be effectively impossible, but if an interested party only wishes to determine how you did some certain part of your implementation, the risks are higher.

In the end, only you can determine whether the risk is worth it for you or your business. However, in many cases, this is the only option you have if you wish to sell your product to customers to use in their own environments.

Regarding the "why its ineffective" - the reason is because a cracker can use a debugger to see where your code is running regardless of what obfuscation technique is used. They can then use this to work around any protection mechanisms you've put in place, such as a serial number or "phone home" system.

I don't believe the comment was really referencing "code theft" in the sense that your code is going to be stolen and used in another project. Because they used the word "cracker," I believe they were talking about "theft" in terms of software piracy. Crackers specialize in working around protection mechanisms; they're not interested in using your source code for some other purpose.

edited Feb 16 '09 at 00:50

answered Feb 16 '09 at 00:19

wuputah

11,285
1
43
60

You say, "given sufficient time and effort". Do you know anything about quantifying that: how much time and effort? – ChrisW Feb 16 '09 at 02:38
In practicality, I don't think you have much to worry about. It would likely take years to fully deobfuscate your 20k LOC project; it would be much faster to reimplement it. Anything truly novel in your code would be patentable, which would be another line of defense in addition to your obfuscation. – wuputah Feb 16 '09 at 03:04
The fact remains that this is what obfuscation is for; anyone who looked at your code and saw it was obfuscated is not going to bother. The "arms race" is either to sell a new product (one up a competitor) and to try and make it even harder for crackers. – wuputah Feb 16 '09 at 03:07
It would be comforting to believe that "it would likely take years to fully deobfuscate your 20k LOC project; it would be much faster to reimplement it". Do you know of any evidence for that: why do you think that's true? – ChrisW Feb 16 '09 at 03:10
I can give you evidence of the contrary game consoles, with specialized encryption, millions of dollars, and hardware keys to make stuff hard to figure out, gets cracked within months, if they are lucky, days if not. – Robert Gould Feb 16 '09 at 03:36
Robert, I'm explicitly not asking about 'cracking' licensing protection (i.e. about finding and then skipping small sections of functionality). – ChrisW Feb 16 '09 at 04:17

James Jones · Answer 3 · 2009-02-16T00:56:44.103

7

Most people tend to write what appears to be obfuscated code and that hasn't stopped the crackers so what's the difference?

EDIT:

Ok, serious time. If you really want to make something that's hard to break, look into polymorphic coding (not to be confused with polymorphism). Make code that is self-mutating, and it is a serious pain to break and will keep them guessing.

http://en.wikipedia.org/wiki/Polymorphic_code

In the end, nothing is impossible to reverse engineer.

edited Feb 16 '09 at 00:56

answered Feb 16 '09 at 00:38

James Jones

8,653
6
34
46

2

Hilarious and should be a comment, not an answer. – Kevin Loney Feb 16 '09 at 00:41
While I have had serious problems with the *code* people write using that obfuscation strategy, there are people in *that* obfuscation community who are able to work on and de-obfuscate that obfuscation without a problem. – displayName Aug 10 '16 at 17:51

score 6 · Answer 4 · edited Jan 12 '21 at 15:20

6

You are worried about people stealing the specific algorithms used in your product. Either you are Fair Isaac or you need to differentiate yourself using more than the way you x++;. If you solved some problem in code that cannot be solved by someone else puzzling over it for a few hours, you should have a PhD in computer science and/or patents to protect your invention. 99% of software products are not successful or special because of the algorithms. They are successful because their authors did the heavy lifting to put together well-known and easily understood concepts into a product that does what their customers need and sell it for cheaper than it would cost to pay others to re-do the same.

edited Jan 12 '21 at 15:20

Liam

27,717
28
128
190

answered Feb 16 '09 at 02:45

Rex M

142,167
33
283
313

Exactly. The "heavy lifting" isn't any super-secret algorithm, it's the 20 KLOC of carefully-designed and tested source code. And if everyone can reverse that more easily than I can write it in the first place, then it would be hard for me to compete with them by "selling it for cheaper". – ChrisW Feb 16 '09 at 02:56
1

@ChrisW, No one is going to reverse your "heavy lifting" cause it is nothing special! Like Rex was saying, they would simply write it themselves. The only time source-theft enters the game is if you have some specific algorithm that is complex or novel enough that others don't have it. – mmcdole Feb 16 '09 at 03:08
1

Assuming that the legality of the action is immaterial, they'll write it themselves if-and-only-if it's cheaper to do that than to reverse it. Is there any evidence about whether and if so how much more expensive it is to reverse obfusticated source than it is to write it from sratch? – ChrisW Feb 16 '09 at 03:28

score 3 · Answer 5 · answered Feb 16 '09 at 01:37

Look at it this way; the WMD editor that you typed your question into was reverse engineered by the SO team in order to fix some bugs and make som enhancements. That code was obfuscated. You are never going to stop intelligent motivated people from hacking your code, the best you can hope for is to keep the honest people honest and make it somewhat hard to break.

score 2 · Answer 6 · answered Feb 16 '09 at 03:20

2

If you've ever seen the output from a disassembler, you'd realize why obfuscation will always fail.

answered Feb 16 '09 at 03:20

Benjamin Autin

4,143
26
34

score 1 · Answer 7 · answered Feb 16 '09 at 00:17

I tend to think that obfuscation is really not very effective if you want to protect your source. For the real expert in the field (I don't mean a software expert here or a cracker, I mean the expert in the field of the functionality of the code), usually he or she doesn't need to see the code, just see how it does react against special inputs, edge cases, etc., to get an idea of how to implement a copy or a code that is equivalent to that protected functionality. Thus, not very helpful in protecting your know-how.

I'd be trying to prevent theft of the maintainable implementation (i.e. source code), not prevent reverse-engineering of the algorithm. — ChrisW, Feb 16 '09 at 00:48

score 1 · Answer 8 · answered Feb 16 '09 at 00:18

If you have IP in code which must be protected at all costs, then you should make your software's functionality available as a service, on a secured remote server.

Good obfuscation will protect you up to a point, but it's all about the amount of effort required to break it against the 'reward' of having the code. If you are talking about stopping your average business user, then a commercial obfuscator should be sufficient.

score 0 · Answer 9 · answered Feb 16 '09 at 00:35

Short answer is yes and no; it depends entirely on what you are trying to prevent. Section twelve of Secure Programming Cookbook has some interesting comments on this on page 653 (which is conveniently unavailable in google books preview). It classifies anti-tampering into four categories: Zero day (slowing down an attacker so it takes them a long time to accomplish what they want), protection of a proprietary algorithm to prevent reverse engineering, "because I can" attacks and I can't remember the 4th one. You have to ask what am I trying to prevent, and if you are really concerned about an individual getting a look at your source code then obfuscation has some value. Used on it's own it's usually just an annoyance to someone attempting to mess with your application and like any good security measure it works best when used in combination with other anti-tampering techniques.

I said what I'm trying to prevent: theft of source code by a competing software vendor who might then incorporate it into their own product. — ChrisW, Feb 16 '09 at 00:41
My answer was intended to be broad and cover obfuscation as a general anti-tampering approach. — Kevin Loney, Feb 16 '09 at 00:53

How effective is obfuscation?

9 Answers9

Linked

Related