3

It easy if the code isn't minimized. but it's hard to tell apart minimized and obfuscated. I've found this: http://research.microsoft.com/pubs/148514/tr.pdf

How would I detect the difference between minified and obfuscated code?

Benjamin Gruenbaum
  • 270,886
  • 87
  • 504
  • 504
karpada
  • 175
  • 1
  • 7
  • I've amended your question since library requests are forbidden it and got you a lot of close/down votes. – Benjamin Gruenbaum May 10 '15 at 20:25
  • Although really - what that paper says should be enough - they just trained a classifier based on code they downloaded and then obsfucated vs minified – Benjamin Gruenbaum May 10 '15 at 20:26
  • Hi Benjamin, I'm looking for a more general approach that will handle correctly new (or tweaked) obfuscators. I don't know how to do this and hope that someone here can help – karpada May 12 '15 at 20:33
  • But the approach they mention in the article _will_ handle correctly new or tweaked obfuscators. – Benjamin Gruenbaum May 12 '15 at 22:05
  • Perhaps I didn't understand the paper, Section 4 states that they have selected the features "c&0x1f" and "z >>> 5". would it still work with a new obfuscator that avoids these features? – karpada May 13 '15 at 20:32
  • I'm looking for a way to capture the essence of obfuscation. – karpada May 13 '15 at 20:33
  • You can use feature extraction and techniques like mutual bootstrapping, the thing you should care about is having access to the obsfucator, once you do you have an very large training set. – Benjamin Gruenbaum May 13 '15 at 20:33
  • Oh, in that case that's simple - by the nature of obfuscation there is no "essence of obfuscation", you can capture the "essence of current obfuscators" but one can always encode the code in some way (like base 64 or any symmetric encryption or even stuff like xml or as tokens or what not) and then decode it at runtime completely nullifying such attempts of capturing the essence. You can't do 100% (which is fine, we almost never do 100% in machine learning problems) - but you can estimate 95% of obfuscated code pretty well. – Benjamin Gruenbaum May 13 '15 at 20:35

1 Answers1

8

There isn't much to talk about here.

But first, lets ask a question: What is minimified code?

Well, that isn't too hard. Wikipedia has it! But doesn't explain how to achieve minified code.

Basically, you need to reduce your code as much as possible, but retain the same functionality.

Lets analize some code!

var times;

times = window.prompt('Insert a number','5');

times = parseInt( times, 10 );

if( !isNaN(times) )
{
  for(var i=0; i<=10; i=i+1 )
  {
    document.write(times + ' &times; ' + i + ' = ' + ( i * times) + '<br/>');
  }
}
else
{
  alert('Invalid number');
}

Now, we can reduce that code a lot!

And that is what minifying is all about.

Now, lets look at this code:

var i=0,t=window.prompt('Insert a number',5);if(t/1==t/1)for(;i<11;i++)document.write(t+' &times; '+i+' = '+(i*t)+'<br/>');else alert('Invalid number');

It does exactly the same! But so much shorter!

What did I do:

  • Reduced the variable names
  • Declared them both at the same time
  • Reduced the number of times that a value is attributed to a variable
  • Replaced the string '5' with the number 5
  • Removed the unnecessary parseInt()
  • Replaced replaced !isNaN(times) with t/1==t/1
    If it isn't a number, t/1 will be NaN.
    If you run NaN==NaN, it will be false.
  • Removed whitespaces (spaced, newlines)
  • Removed braces

This code can be reduced even further, but you can (a little harder) see the functionality.

There are more techiniques to reduce the code size, but I won't go into detail.


But, now, another question: What is obfuscated code?

Obfuscated code is code that is incompreensible to us.

You can read the code, but the functionality won't be easily understood.

This goes a lot further than minifying. Reducing it's size isn't a requirement.

But, most of the time, the obfuscated code is reduced in a way you wouldn't understand.

Only those who know will be able to understand it.

JSF*ck is an example of this.

Using 2 online tools, here is what obfuscated code would look like:

Obfuscated using http://www.jsobfuscate.com/ :

eval(function(p,a,c,k,e,d){e=function(c){return c.toString(36)};if(!''.replace(/^/,String)){while(c--){d[c.toString(a)]=k[c]||c.toString(a)}k=[function(e){return d[e]}];e=function(){return'\\w+'};c=1};while(c--){if(k[c]){p=p.replace(new RegExp('\\b'+e(c)+'\\b','g'),k[c])}}return p}('4 2;2=d.e(\'c a 3\',\'5\');2=b(2,6);7(!8(2)){9(4 i=0;i<=6;i=i+1){h.j(2+\' &2; \'+i+\' = \'+(i*2)+\'<f/>\')}}k{l(\'g 3\')}',22,22,'||times|number|var||10|if|isNaN|for||parseInt|Insert|window|prompt|br|Invalid|document||write|else|alert'.split('|'),0,{}))

Obfuscated using http://packer.50x.eu/ :

eval(function(p,a,c,k,e,d){e=function(c){return(c<a?'':e(c/a))+String.fromCharCode(c%a+161)};if(!''.replace(/^/,String)){while(c--){d[e(c)]=k[c]||e(c)}k=[function(e){return d[e]}];e=function(){return'\[\xa1-\xff]+'};c=1};while(c--){if(k[c]){p=p.replace(new RegExp(e(c),'g'),k[c])}}return p}('£ ¡;¡=©.¨(\'§ a ¢\',\'5\');¡=¥(¡,¤);¦(!ª(¡)){«(£ i=0;i<=¤;i=i+1){®.¬(¡+\' &¡; \'+i+\' = \'+(i*¡)+\'<±/>\')}}­{¯(\'° ¢\')}',17,17,'times|number|var|10|parseInt|if|Insert|prompt|window|isNaN|for|write|else|document|alert|Invalid|br'.split('|'),0,{}))

Using those tools, there are a few similarities:

  • Both have an eval()
  • Both create a function with the variables p,a,c,k,e,d.
  • Both have a list of all the proterties and other stuff at the end
  • Both use string voodoo to generate the code

But is every obfuscated code equal? NO! It isn't.

Here is an example:

var ________________ = [] + []; var _ = +[]; _++; var _____ = _ + _;
var ___ = _____ + _____; var __ = ___ + ___; var ____ = __ + __; var ______ = ____ + ____;
var _______ = ______ + _; var ___________ = ______ + ______ + __;
var ______________ = ___________ + ____ -  _; var ____________ = _ + _____;
var ________ = _______ * ____________ + _; var _________ = ________ + _;
var _____________ = ______________ + ______ - ___ - _; var __________ = _____________ -
____________; var _______________ = __________ - ____________; document.write(________________ +
String.fromCharCode(___________, _________, _______________, _______________, __________,
______, ______________, __________, _____________, _______________, ________, _______));

This was taken from another website. You can view the original answer here: https://codegolf.stackexchange.com/a/22746/14732

How do you tell this apart? You simply can't. Or you are a super genius who can see an obfuscated code ans see what it does.

You would need a really smart algorithm to know what the code does. And then rebuild it backwards. If both codes arent the same, then it may be obfuscated.


Conclusion: you can't tell apart an obfuscated code and a minified one.

Community
  • 1
  • 1
Ismael Miguel
  • 4,185
  • 1
  • 31
  • 42