16

Google uses bsdiff and Courgette for patching binary files like the Chrome distribution. Do any similar tools exist for patching jar files?

I am updating jar files remotely over a bandwidth-limited connection and would like to minimize the amount of data sent. I do have some control over the client machine to some extent (i.e. I can run scripts locally) and I am guaranteed that the target application will not be running at the time.

I know that I can patch java applications by putting updated class files in the classpath, but I would prefer a cleaner method for doing updates. It would be good if I could start with the target jar file, apply a binary patch, and then wind up with an updated jar file that is identical (bitwise) to the new jar (from which the patch was created).

Ken Liu
  • 22,503
  • 19
  • 75
  • 98
  • What advantage are you looking for over simply treating the jar as a binary file and patching it? – Devon_C_Miller Mar 30 '10 at 21:17
  • 2
    Greater compression. Bsdiff and Courgette achieve higher compression ratios because the algorithms they use are designed specifically for compressing large executables. I don't know, but it seems to me like the same thing could be done for jar files. – Ken Liu Mar 31 '10 at 01:53
  • 1
    Related question: http://stackoverflow.com/questions/3738523/patching-java-software – Tahir Akhtar May 24 '11 at 09:49
  • 1
    To prepare a *patch*, decompress the *jar* and then run `bsdiff`. Transmit the *patch*; decompress the old *jar* and run `bspatch`. Compress the output to get the new *jar*. The `bsdiff/bspatch` suite are architecture independent. They should work fine on Java byte code. – artless noise Jun 26 '13 at 22:18
  • jar files are zip files under the hood, so its something to efficiently patch zip files that is required. You could just unzip them, run a binary diff tool and zip them up again, but I guess this might not be optimal, and you might end up with slightly different jar files at the end of the process, depending on the version of the zip algorithm used to make them, for example. – rjmunro Oct 09 '13 at 13:37

4 Answers4

2

Try the javaxdelta project on Sourceforge. It should allow to create patches and to apply them.

[EDIT] This tool doesn't exist, yet. Open the JAR file with the usual tools and then use javaxdelta to create one patch per entry in the JAR. ZIP them up and copy them onto the server.

On the other side, you need to install a small executable JAR which takes the patch and the JAR file as arguments and applies the patch. You will have to write this one, too, but that shouldn't take much more than a few hours.

Aaron Digulla
  • 321,842
  • 108
  • 597
  • 820
  • Hi Aaron - I'm not looking for a binary patch algorithm implemented in java, I'm looking for a patch tool that is tailored for working specifically with JAR files, along the lines of pack200. – Ken Liu Mar 30 '10 at 18:24
  • This tool doesn't exist, yet. Open the JAR file with the usual tools and then use javaxdelta to create one patch per entry in the JAR. ZIP them up and copy them onto the server. On the other side, you need to install a small executable JAR which takes the patch and the JAR file as arguments and applies the patch. You will have to write this one, too, but that shouldn't take much more than a few hours. – Aaron Digulla Mar 31 '10 at 07:30
  • @Ken -- What Aaron is suggesting isn't a real algorithm, but is what you will want to do. Jar files are zip files with class files in them. A binary diff of old/foo.class and new/foo.class is likely to be reveal few differences, while a binary diff of old/foo.jar and new/foo.jar are likely to look way more different. You can easily just compress (zip) the binary diff of the class files and send that to your endpoints. Then at each end point they will unzip their each jar file that needs to be patched, apply the patch, and then zip them back up. – nategoose Apr 03 '10 at 21:50
  • 1
    (continued) You may not get the exact same binary jar files if you use different versions of compression tools on your endpoint than you do on you front end machine. diff and compression are very related fields, but can often not work well with each other. They are both looking for patterns of sameness and change. Essentially diffing compressed files will almost always amplify the differences. Compressing diffs, on the other hand, will likely yield so so to good compression percentage, but be much smaller than the diff of the compressed files even if that also compressed. – nategoose Apr 03 '10 at 21:57
  • 1
    guys, I get what you are saying, I really do. I'm not talking about binary diffing jar files. Did you read up on Courgette and pack200? Courgette uses an algorithm that works better than bsdiff because it specifically targets executable binaries. What I am imagining is an algorithm that exploits characteristics implicit to class files in order to produce a binary diff that is smaller than what a typical binary diff produces. Pack200 does this for compression by rearranging the contents of the class files contained in a jar. – Ken Liu Apr 04 '10 at 03:33
1

.jar files are already compressed, so what you're really asking for is a compression that works well on zip files ;). If you get at it before you stuff it in the jar you have better odds of taking advantage of the knowledge that it is "java" I expect. I suspect you could adapt the factorization described in this paper as a java specific compression method.

Logan Capaldo
  • 39,555
  • 5
  • 63
  • 78
  • pack200 already deals with the compression problem; what I'm looking for is something that deals with patching. http://java.sun.com/j2se/1.5.0/docs/guide/deployment/deployment-guide/pack200.html – Ken Liu Mar 31 '10 at 13:16
1

The problem is that very small changes in the source can have very big changes in the compressed .JAR file.

This is an artifact of removing redundancy. So no matter how good your diff tool is, it's got a nearly impossible task on its hands.

However - there is a solution, which is to generate the diffs and apply the patches to the uncompressed data. For example:

Say you have

project-v1.jar

project-v2.jar

Generating the diff between these two files is likely to be huge, even though the internal change could be very small. So say we have an 'unjar' and 'rejar' program - we can generate

project-v1.jar -> project-v1.jar.unjar

project-v2.jar -> project-v2.jar.unjar

Then, doing the diffs between the 'unjar' files to generate a patch. To apply the patch would be

project-v1.jar (unjar)-> project-v1.jar.unjar - (apply patch)-> project-v1.patched.unjar (rejar)-> project-v1.patched.jar

Effectively, the 'unjar' (and 'rejar' is the reverse) programs should take a source ZIP file (or any other type of file), and uncompress the contents - including the headers, attributes and any other detail to an output stream (rather than creating individual files).

This ought not to be a very complicated filter to write. Added bonus would be to make it compression-aware and recursive (so you could apply it, say, to a WAR file).

user340535
  • 663
  • 2
  • 6
  • 15
  • The problem is that source code (unjar .JAVA files) is larger than the compiled code in .JAR file. So, the difference between .JAVA file can have larger size than the difference between 2 .JAR or .DEX files. – Nikolai Samteladze Sep 23 '12 at 22:29
  • .unjar doesn't imply sourcecode, it implies an (uncompressed archive) of .class files -- I.E: jar with the 0 (no zipping) option. You would also compress your diff! – user340535 Dec 07 '12 at 13:54
0

check delta-updater

it is build for binary diff and patching directories, you can use IOFilter to filter included files

DrAhmedJava
  • 731
  • 8
  • 16