Decoding html returned as json response - android

Question

I am getting following encoded html as a json response and has no idea how to decode it to normal html string, which is an achor tag by the way.

x3ca hrefx3dx22http:\/\/wordnetweb.princeton.edu\/perl\/webwn?sx3dstrandx22x3ehttp:\/\/wordnetweb.princeton.edu\/perl\/webwn?sx3dstrandx3c\/ax3e

I have tried java.net.UrlDecoder.decode without anyluck.

That's not JSON at all. Where is this data coming from that is claiming it is JSON? — Tyler, Sep 23 '10 at 06:05
here is the actual JSON response [{"type":"text","text":"Resentment - B\x27Day is the second studio album by American R\x26B singer Beyoncé Knowles, released September 4, 2006, on Columbia Records in collaboration with Music World Music and Sony Urban Music. Its release coincided with Knowles\x27 twenty-fifth birthday. ...","language":"en"},{"type":"url","text":"\x3ca href\x3d\x22http://en.wikipedia.org/wiki/Resentment_(song)\x22\x3ehttp://en.wikipedia.org/wiki/Resentment_(song)\x3c/a\x3e","language":"en"}] — Waqas, Sep 23 '10 at 06:13

Keenora Fluffball · Answer 1 · 2014-01-13T07:01:03.220

7

The term you search for are "UTF8 Code Units". These Code units are basically a backslash, followed by a "x" and a hex ascii code. I wrote a little converter method for you:

public static String convertUTF8Units(String input) {
    String part = "", output = input;
    for(int i=0;i<=input.length()-4;i++) {
        part = input.substring(i, i+4);
        if(part.startsWith("\\x")) {
            byte[] rawByte = new byte[1];
            rawByte[0] = (byte) (Integer.parseInt(part.substring(2), 16) & 0x000000FF);
            String raw = new String(rawByte);
            output = output.replace(part, raw);
        }
    }

    return output;
}

I know, its a bit frowzy, but it works :)

edited Jan 13 '14 at 07:01

answered Sep 23 '10 at 07:21

Keenora Fluffball

1,647
2
18
34

thanks Keenora, but I already did it using regular expression – Waqas Sep 27 '10 at 08:41
I needed it for PowerShell and I could not get it converted in a fast way, then I found a way simpler method here: https://stackoverflow.com/a/49344121/2964949 – Patrick Oct 31 '18 at 10:48

score 1 · Accepted Answer · edited May 23 '17 at 12:18

1

That's not an encoding I've seen before, but it looks like xYZ (where Y and Z are hex digits [0-9a-f]) means "the character whose ascii code is 0xYZ". I'm not sure how the letter x itself would be encoded, so I would recommend trying to find out. But then you can just do a find and replace on the regex x([0-9a-f]{2}), by getting the integer represented by the two hex numbers, and then casting it to a char (or something similar to that).

Then also, it looks like slashes (and other characters? See if you can find out...) always have a backslash in front of them, so do another find-and-replace for that.

edited May 23 '17 at 12:18

Community

1
1

answered Sep 23 '10 at 06:05

Tyler

21,762
11
61
90

You should also try to figure out how unicode characters above `ff` would be represented, and be sure to modify your approach accordingly. – Tyler Sep 23 '10 at 06:07
i faced same problem in retrieving rarbic json data in this link https://www.facebook.com/feeds/page.php?id=103622369714881&format=json can y tell me please what did you do ?? – eng.ahmed Sep 05 '13 at 03:34

score 1 · Answer 3 · answered Mar 16 '12 at 02:27

1

Thanks!!

Take care, in the for the operator must be "<=" else one character can't be decoded.

for(int i=0;i<=input.length()-4;i++) {..}

Cheers!

answered Mar 16 '12 at 02:27

Nico Bigatti

11
1

score -2 · Answer 4 · answered Jul 31 '14 at 22:30

-2

This works for me

    public static String convertUTF8Units_version2(String input) throws UnsupportedEncodingException
    {
         return URLDecoder.decode(input.replaceAll("\\\\x", "%"),"UTF-8");
    }

answered Jul 31 '14 at 22:30

jimbo

1

Decoding html returned as json response - android

4 Answers4

Linked