0

I'm trying to parse XML files produced by a third party tool ( [HTC Sync Manager][hsm] ) , since said tool refuses to work anymore.

XML parsing is simple enough, but I'm stuck with trying to decode binary image data serialized into those XML files. As an example, I have a 886 character string which I know represents an image. The full string is

/9j/4AAQSkZJRgABAQAAAQABAAD/2wBDAAIBAQEBAQIBAQECAgICAgQDAgICAgUEBAMEBgUGBgYFBgYGBwkIBgcJBwYGCAsICQoKCgoKBggLDAsKDAkKCgr/2wBDAQICAgICAgUDAwUKBwYHCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgr/w AARCAA8AFADASIAAhEBAxEB/8QAHwAAAQUBAQEBAQEAAAAAAAAAAAECAwQFBgcICQoL/8QAtRAAAgEDAwIEAwUFBAQAAAF9AQIDAAQRBRIhMUEGE1FhByJxFDKBkaEII0KxwRVS0fAkM2JyggkKFhcYGRolJicoKSo0NTY3ODk6Q0RFRkdISUpTVFVWV1hZWmNkZWZnaGlqc3R1dnd4eXqDhIWGh4iJipKTlJWWl5iZmqKjpKWm p6ipqrKztLW2t7i5usLDxMXGx8jJytLT1NXW19jZ2uHi4+Tl5ufo6erx8vP09fb3+Pn6/8QAHwEAAwEBAQEBAQEBAQAAAAAAAAECAwQFBgcICQoL/8QAtREAAgECBAQDBAcFBAQAAQJ3AAECAxEEBSExBhJBUQdhcRMiMoEIFEKRobHBCSMzUvAVYnLRChYkNOEl8RcYGRomJygpKjU2Nzg5OkNERUZHSElKU1RVVldYWVpjZGV mZ2hpanN0dXZ3eHl6goOEhYaHiImKkpOUlZaXmJmaoqOkpaanqKmqsrO0tba3uLm6wsPExcbHyMnK0tPU1dbX2Nna4uPk5ebn6Onq8vP09fb3+Pn6/9oADAMBAAIRAxEAPwD9rP2sfh1Y/FX9mbxr8Nrwf8hTw7dKfyJrwX9gS++Bvxf+A9hoen+CtMQ678OrO18R21qnkfarq2a6tLkY/wA9a+iP2ivFx8Cfs9eOPGFr10zw5d zj8Lc4r8+PiVq+tf8ABKGb4Z/tG+GdBtrnRvHXgO20PxLpF3eYtrXXfs1tc/av/JWtobHJiT5S/wCCj3xUHg8azY+CrDUtG1m1vPs15d3d59o/t215/wBKrjvh5Z6F498FWPgXRPFmpaKLnGtWmk+Hrv7P9q+y2v8Ay7Vw/wC2R8ede/ar1bxh/wAJRoWm6dqNrZ3Wo6QdJs//AAKr5I8DfEj4k+JfiPpF9omvalp2oab/AMS6z +yXn2b7LXzmJ/31hkeIwf8AtWGP0Isvhv8ADWx8UnXfGv8Abeoi1s+Td6PXl91/wlnxK+ANxrhGpa14hudYuvslpaf8+vP/AC7VzF3efGbwYYLHWvFut/6L1+16vXLWXxt/Zu+FPiorrvxK1vRtYtv9JP2S7uqpf9QwLLcZ/wAxJ9MfCX4C+FPiT8C9I/4Sn7TcafqWj2upauftn/PtRZfBH9nu08PGxsfhP4kFt9s+0/ZLvV68n+GGs+FPHngo6j8O/Het3Ojf8e32T+17qvaPgx8HvCfiPwUb6+13WxcfbP8AoY7rmj/ZDnxGGxhx37Y1np998LNI0Kz8J3OnW9reWttaXd3eVteHbyxs/wBnbR/hRomu22o6zc/8eVdv4g/Zg+HHjHS/sPin+27i39bvxHc1X8OfsTfAiyJvrPT9btvsv/Ux3Vc2J/2rY58L9cwh+hn/AAVH/avj0H9jnxV4W0HQbn7RrxtNOtLr/r5ucfyArzL9s/4pfDX9pT9nbQfAvxq+zabo1r4x0u3/AOva1uj/AGX/AO3Vfkf8bf2wfjr4x8PWFj8RPiTc6jb2usWtz/pdef8AxO/4KRfErWdJuPA3/Cd/2l/062le3hcywbOi5s6qL74V/HzUPhR4ov8A+0fsv2rTRq3/AD82teUeEtHOkeKbi+ay/cWt3/pZrjvFvxg+JXxI+JR17W9Q/wBItbz/AEOv0Z/ZO8N2Vn+wJ4h/aCXQtN/tjXLz+ztIu9Ws/tP2W1tf+XW2/wC3q5rlX+1HnLDfVc4+sf8AUKePfG3R/HV5dXP23xb/AKMOhtLOvB/jf8Er7x9c2+ueFvswuNLs/s32T7Z/pNfaHivweNZ+Blh/xPrn+1xef8Tj/TK+f9V8OeE7zxnp/hU3/wBpguLz7NeUYnLcYfV4jO8HiuD8NmP/AEFHL/ss+MvjR8NNK/4VR/wrX9//AKVqP2q7vPs1fXHwZ+Nnizwz4LNgPAtt9oF59p/0vV/s2a8f1b4bX3gL4qwWHhix+021zZ8m7vP+ParHiL4keE7LXofCnxD1D+xre568Vz/VsZ/zDYY5ksH/ALticSfQA/aR8WXd5caE3wZ/0i1/59PEVrXUaR8bdc0bw/8A8JX43+C/iS20e1/59LzTLn/27rz/AEn42/s2ppNvoXhfxZptzcf8e2j2n2P/AI+etctquj39jpOn6HeWP+k3N59pvPslc2WYbGYvB4rE4nDHjYjE/VMZ9Wwx8k/tH/8AEmSwKn/SLn/j7/WuH+Df7IHiD4x6N4g+I3w8srrUdX0u8/0PSbSz/wCPmvcNX8HWHxH+Kdz4UvrL7R9m0fm0x/09V9P/AGL9l/4U/wDBOPxxqHw88dab/wAJhc2f2b/RO3+lf6Va17WWZa8Vi39YObMsSfn/AOH/ANnjXtG1XSNb8a6DqR/t29+zWV39juf7Ntf+3n/l7r9KPjdeWHwE+Gnwx/Yz0XT/AO0bjQdH+06xaWh+zf8AE0/4+rqvRv2R/ixfft4ftRfDrwV4L/Z5utG+Fmgata3Ruls/tRP2X/j2+0/8+te9eI/2D/FWr/to+MPiv43+FAufD11e40gZ+zf9vVzc/wDPrXs/VsHhMX/sw8ThsZiuG8VhsN/vJ+Z1n8EPjp8SdV1Gx+Hvw21u5uLX/SLwWln/AMe1V/A37I+vHwrc/Ej4hDUtF0+1vPs2jj/l5+1V+lHx5+KngXwz4Vuf2b/h5ff2KReWn2y7tLP7NbXVfIHj/wAeaD4xtNA+BQ1+2ufEH2z7MLv/AJdq6MRszxOEeAMnyr/kZf7UfM/iD4w3+jj+3PGv/E5/sO8+zXn2T/R/tVrR4stLLx5d6Pr2jfCfUrm4/wCXy0u7z7T/AKLXH/G7wHrvwD1TX/A3ijxZbXOoXOsfaftf2P7NXp/wx8X6hrOgf8Ilf/6N9qvLW2+12n/HzXx+I/3PFH1ed/75hvqxz+lfFX7b8X/BHw2GhW1vp+hXl19ju/8An6r6htfEf/ErucHpXzf4h0bwJq9zceONb0PUra20y8/0v+1rP/Sbau/8Qax4s8BfB+DxX8L9dttZnttY63f+k/6LXflmd/VVhsuy48bEZH9b/wBpzE8o+He7SPFPjD7cfs3jG5vLXTbO0/59f9F/0qvoD9lb9gLT/j5r9x4T8UH/AIRw21n9p/48/tNzdf8ATrbVZ/Yo+FHgTX7PVtR13Rlu9S1Pxhqn27WZzm8l/wBK7y/er9Lfhr8HvBnwS/4Jz6t8YfAa3UXijWtDnaXXZ7kvcWxJYfuWwNnQetPC4h6nvYfAxxO55ivhD4r/ALF1p8Ofht8FfAlsPC+g2d1/Y/w/8J/6Trep/wDXzVj9q/8Abv8AjHpHhaxv/iDoNr4MJxcHw/q13a3N1bdf+XavhT4WftWfH74ReIfGvjPwv8RL2XUWvLTbcahIZinX7uTxXkvxw+J3jv4x/F6wj8aeIZnF/ef6X9nOzzfr1reGIc/4eh9BLBYeOD/eq523xZ/bO8J3uq6hfWN9me6vPtN3d3f/AC9f9u1fH3xi/ae8V3viC3HhbQvs32n/AI87uu3+O/hHQPAOrSReFrIW4jvPlxz61wP7JHhbRvHfxT8nxHbmVTLsODjiubMP6/E8LHZ5i6C/dOx7v+1La2Hx6+FnwY+K/jX/AEbX9T0f7Nd3f2P/AI+utV/hL4O8W6v4VOht4rubf7NrH2kfZP8Ap1r0rxDcR+IPF2nfD/xDZQX2l6VZ/wCh21ym4JXTXPhjwn4D/Z4tvF3hjwtY2+oXtxvnuBDyx8/zPX+9XPhv98Z5WdUqHYz/AA18H/2hdY0mw08Xupalo1zef8en2T/j6r6A/aF/4Ju6/wCAv2doPitf67olxqF1/wAhjSvtf2a5tev/AD917J/wRX0rRv2pLO+u/i5o1tctYWY+z/Y08kLz/smvpOTSPCXx38f3/wCz58ZfAmjeJNC8OSbNIm1SzD3cA9psg1zYDDKtjP3up62BynFUcHajWaP/2Q==

Base64 decoding it leads to garbage, so that's not the answer.

How can I decode this kind of data?

Robert Munteanu
  • 67,031
  • 36
  • 206
  • 278
  • Could you post the entire string? – Robby Cornelissen Aug 22 '14 at 08:33
  • Can you also tell us what you know about the image? Height and width and type (photo, icon, etc.) in particular. – chiastic-security Aug 22 '14 at 08:35
  • 1
    It is quite striking that it's exactly the right character set for base64. But I think we will need the rest of the string to be able to advise any further. – chiastic-security Aug 22 '14 at 08:46
  • @RobbyCornelissen - I've added the full image string – Robert Munteanu Aug 22 '14 at 09:19
  • 6
    Base64 decoding it doesn't lead to garbage. It's an 80x60 JPG image of a woman holding a child. If you show us the code you're using to obtain the value and decode it, perhaps we will be able to tell you how to fix your code. – JLRishe Aug 22 '14 at 09:23
  • It didn't help that when you posted the initial part of the string, it was wrong! The full string doesn't start with the thing you first posted (`rb...`). – chiastic-security Aug 22 '14 at 09:42
  • @JLRishe - yup, the problem was my XML parsing code. Thanks for the info and see my answer for details. – Robert Munteanu Aug 22 '14 at 09:42
  • @chiastic-security - I posted a different image the first time. I had to hunt down for one I think no one will mind being shared. The first one I wasn't sure about, therefore I only posted a substring. – Robert Munteanu Aug 22 '14 at 09:43

1 Answers1

0

As it happens, the problem was somewhere else. I am parsing the XML file using a SAX ContentHandler, e.g.

reader.setContentHandler(new DefaultHandler2() {

            private String currentLocalName;

            @Override
            public void startElement(String uri, String localName, String qName, Attributes attributes)
                    throws SAXException {

                currentLocalName = localName;
            }

            @Override
            public void characters(char[] ch, int start, int length) throws SAXException {
                  // read data here
         }

My problem was that the characters method is invoked multiple times, and the raw image data would be overwritten and only the last part would be kept.

The raw data is indeed Base64 encoded.

Robert Munteanu
  • 67,031
  • 36
  • 206
  • 278