4

I have a program that is compressing a string in an unknown way. I know a few inputs and the output produced, but I am not sure what is being used to compress the string.

Here are my examples.

(just 38 x a, no spaces or anything else)

In:  "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"
Out: "21 1A A6 30 00"

(just 32 x a)

In:  "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"
Out: "1c 1a a7 a0 00"

(31 x a, then 1 b)

In:  "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaab"
Out: "01 77 c5 53 c0 00"

(31 x b, then 1 a)

In:  "bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbba"
Out: "1e 77 54 f3 80 00"


In:   "Hey wot u doing 2day u wanna do something"
Out:  "11 C7 C6 2E 78 CE 6B 8E 3A CD 83 E8 1B 37 C5 C5 A6 B9 D1 E1 B0 69 63 DB 5E 71 15 5C 10 00"

(same as previous string, but with a space at the end)

In:  "Hey wot u doing 2day u wanna do something "
Out: "12 C7 71 8B 9E 33 9A E2 EB 36 0F A0 2C DF 17 17 7A 67 47 86 DF 4B 1E DA F3 88 AA E0 80 00"

Any help / advice would be great, thanks! Also, it may help to know these are from a BlackBerry 8120

James
  • 139
  • 1
  • 9
  • Can you try compressing some other inputs, e.g. a null string, a single character, two characters ? – Paul R Feb 22 '10 at 15:53
  • I cant i'm afraid; I dont have the program, just been given these examples and asked to work out the compression method... I've been told it wont compress strings below 30 characters though as it is not efficient to do so. – James Feb 22 '10 at 16:06
  • One other observation to add; for the two similar strings, there are two repeated bytes at the same point. For the first it is c5c5, and the second is 1717. Might be coincidence, might also relate to the "nn" in wanna perhaps? – James Feb 23 '10 at 09:33
  • Updated with some more strings I managed to get. – James Mar 18 '10 at 16:21

1 Answers1

1

Its unlikely that someone can figure out what kind of compression algorithm is being used just by looking at the supplied strings.

Assuming that they're not encrypted also (but merely transformed using an algorithm without the input of a key or other kind of secret), the only approach I can think of is brute force. That is, write some code to transform the input values using different compression algorithms and observe the outputs generated. It does not seem to be the LZW algorithm used by the .NET DeflateStream and GZipStream classes, so you can skip at least one ;)

My recommendation would be to look at the BlackBerry SDK and find out what algorithms it supports, as it's likely to be one of those.

You may also find this tutorial to be of interest: Hacking Data Compression

Morten Mertner
  • 9,414
  • 4
  • 39
  • 56
  • thanks for the link, I'll work through that now. Blackberry SDK shows support for zlib, gzip and deflate... None of these seem to work. There are a lot of crypto api files on the blackberrys, but I'm hoping its just compressed and not encrypted in anyway I was hoping I'd just missed something :) but I'll try the brute force way as you suggest and let you know how I get on.. – James Mar 24 '10 at 09:04