4

I need to parse some data that has encoded primitive types (ints, floats, doubles, floats) outputted by java. I'm adding this functionality to an existing set of python scripts, so rewriting it in Java isn't really an option. I'd like to re-implement and/or use a python library to decode the data (e.g. TH3IFMw for a float).

I don't recognize this encoding. I'm working with the requests sent to Google Web Toolkit, and based on the source here and here - I thought it was string.ValueOf - but this is incorrect. Does anyone recognize it?

Tom
  • 41
  • 2
  • 1
    The source code of the class library is typically bundled with the JDK. Check out src.zip in your JDK root directory. – Péter Török Aug 23 '11 at 15:37
  • 3
    @Tom: How was the data encoded as TH3IFMw? String.valueOf will not decode this kind of data. – Cam Aug 23 '11 at 15:45
  • @Cam: It's Google Web Toolkit encoding something as a float. Some source I saw indicated it was string.ValueOf - this seems to be wrong. Since the entire question is flawed and misleading, I'd like someone to flag it for deletion. – Tom Aug 23 '11 at 15:48
  • 1
    Ok - so there's been some confusion here. First of all. String.valueOf() does _not_ do what you have suggested. Instead, it takes in various kinds of arguments (bools, ints, characters, etc) and converts them to strings. Can you tell us how the data was encoded as TH3IFMw so we can help you figure out how to decode it? – Cam Aug 23 '11 at 15:49
  • @Cam: I don't know. I thought it was string.ValueOf based on http://code.google.com/p/google-web-toolkit/source/browse/trunk/user/src/com/google/gwt/user/server/rpc/impl/ServerSerializationStreamWriter.java?r=10146#134 and http://code.google.com/p/google-web-toolkit/source/browse/trunk/user/src/com/google/gwt/user/client/rpc/impl/AbstractSerializationStreamWriter.java?r=1302#49 – Tom Aug 23 '11 at 15:51
  • The edit to your question you've provided is very helpful (hence the upvotes ;) ). Can you provide us with a specific situation where this kind of value is returned to you by GWT? For example, what number does that string actually represent? – Cam Aug 23 '11 at 15:58
  • 1
    Do you know what the number you're actually expecting is? Looks like base64 with a missing '=' to me. – Affe Aug 23 '11 at 16:24
  • @Affe: I tried that - it didn't work. – Cam Aug 24 '11 at 04:20
  • Yeah, there's 5 bytes worth of characters there. Can't imagine what else it could be though. – Affe Aug 24 '11 at 05:31

1 Answers1

1

I think this is encoding a long int, not a float. In particular, it's probably 0x0000004c7dc814cc, but might be 0x00000131f7205330.


My reasoning...

Looking through the code you linked to, it doesn't look like anything remotely out of the ordinary is being done to floats, and the standard valueOf implementation definitely does nothing like this.

On the other hand, the string TH3IFMw looks for all the world like a base64 encoded string. I can't think of many other common encodings that use upper alpha, lower alpha, and digits. Looking through the same code, I can only find one reference to base64... line 575 of StreamWriter, where it handles the encoding long instances. This is the only part of the linked code which seems even remotely capable of generating the output you observed.

Looking at the size of the string... assuming it is base64, it's missing a trailing = padding/alignment character, but some implementations of base64 do omit these for brevity. Adding that back (TH3IFMw=), and decoding as base64, this results in the hex value 0x4c7dc814cc. This is only 5 bytes in size, which is a little odd. But this does mean it's probably not a float (4 bytes) or double (8 bytes).

But this could fit with line 575's encoding of a long... looking at the documentation for Base64Utils.toBase64, it makes reference to the fact that "Leading groups of all zero bits are omitted." This would explain the 5 byte value, if the original long was 0x0000004c7dc814cc.

However, the documentation's wording is frustratingly ambiguous (and I don't have java+gwt available to me right now to test). "leading groups of all zero bits" could mean they are omitting source bytes which are all zeros, but it could also meaning they're omitting leading A characters from the encoded base64 characters (A represents 6 0 bits in base64). If that's the case, then the actual base64 string is ATH3IFMw, which decodes to the long value 0x00000131f7205330.

If you can find either of those numbers in what you're providing as input, then that's probably what's happening. If not... I'm afraid I'm stumped.

Eli Collins
  • 8,375
  • 2
  • 34
  • 38