1

I have been to trying to modify a xml file using VTD-XML.The xml has been received from a java (JAX-WS) web service as a String. The http response header from the server has content type : text/xml and charset = utf-8.

Here is the code :

private static byte[] getDataFromFile(String filePath) throws IOException {
    File file = new File(filePath);
    FileInputStream fileInputStream = new FileInputStream(file);
    byte[] byteArray = new byte[(int) file.length()];
    fileInputStream.read(byteArray);
    String fileData = new String(byteArray);
    byteArray = fileData.getBytes("UTF-16");
    return byteArray;
}

private static void cutOffXmlByXpath(String xpathQuery, String inputFilePath, String outputFilePath) throws Exception {
    byte[] byteArray = getDataFromFile(inputFilePath);

    VTDGen vg = new VTDGen();
    vg.setDoc(byteArray);
    vg.parse(false);
    VTDNav vn = vg.getNav();

    AutoPilot ap = new AutoPilot(vn);
    ap.selectXPath(xpathQuery);

    XMLModifier xm = new XMLModifier(vn);

    while((ap.evalXPath())!=-1) {
        xm.remove(vn.getElementFragment());
    }

    xm.output(outputFilePath);
}


public static void main(String[] args) {
    try {
        cutOffXmlByXpath("//Part[@identifier != 'ID Page. Interview and Profile Form' and @identifier != 'Reports']", FILE_PATH, OUTPUT_FILE_PATH);
    } catch (Exception e) {
        e.printStackTrace();
    }
}

The declaration above the xml is such :

<?xml version="1.0" encoding="utf-16"?>

Which is why I am reading the bytes from the file in UTF-16 in the getDataFromFile() method. Otherwise, the code throws an exception stating that that it cannot switch to encoding UTF-16.

Now the code above throws the following exception :

java.lang.IndexOutOfBoundsException
at java.io.FileOutputStream.writeBytes(Native Method)
at java.io.FileOutputStream.write(FileOutputStream.java:345)
at com.ximpleware.XMLModifier.output(XMLModifier.java:2068)
at com.ximpleware.XMLModifier.output(XMLModifier.java:2193)
at Main.cutOffXmlByXpath(Main.java:111)
at Main.main(Main.java:161)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at com.intellij.rt.execution.application.AppMain.main(AppMain.java:134)

If I change the encoding of the file to UTF-8 and modify the getDataFromFile() method accordingly (that is, we read bytes from the file without specifying any encoding or UTF-8 as encoding) everything works fine.

Any help would be appreciated.

vtd-xml-author
  • 3,319
  • 4
  • 22
  • 30
Saad galib
  • 11
  • 3
  • Can you post a simplified version of XML file with that encoding header? – vtd-xml-author Mar 22 '16 at 19:01
  • Thank you for your prompt response. I have uploaded a file [here](https://drive.google.com/file/d/0B6soN-_-NHx6VHhFUWJhMF9pcFE/view) using which I have been able to reproduce the exception. The file is utf-16 encoded and I have modified the getDataFromFile() method accordingly(that is, to read raw bytes only without specifying any encoding) to pass to vtd-xml. – Saad galib Mar 23 '16 at 08:18
  • Can you send this XML file or better the entire test case to me via email to jzhang@ximpleware.com? – vtd-xml-author Mar 23 '16 at 18:21
  • @vtd-xml-author I have sent you the file. Please let me know when you have taken a look at the issue. – Saad galib Mar 24 '16 at 04:43
  • I will get back to you asap... – vtd-xml-author Mar 24 '16 at 05:10
  • I check your xml file again, it is not utf-16 encoded at all, it is instead utf-8 encoded... – vtd-xml-author Mar 25 '16 at 23:41
  • I didn't notice the parseFile() method before. That's definitely a lot better. About the encoding of the file, if I run this on linux, `file -i utf-16.xml` I get the result : `utf-16.xml: application/xml; charset=utf-16be`. Am I making any mistake? – Saad galib Mar 26 '16 at 12:12
  • I have tried on your update utf-16.xml on my computer and do not experience said exceptions... can you download the latest 2.12 and give it a spin on your end? – vtd-xml-author Mar 30 '16 at 21:36
  • Yes, sure. I will do that and get back to you. – Saad galib Mar 31 '16 at 03:31
  • Saad, any updates? – vtd-xml-author Apr 04 '16 at 22:33
  • Sorry, that I couldn't get back earlier. Seems like vtd-xml 2.12 is not available on maven. I have downloaded it from here https://sourceforge.net/projects/vtd-xml/?source=typ_redirect But I am not getting the XMLModifer here. – Saad galib Apr 05 '16 at 06:30

1 Answers1

0

I loaded everything on my end into eclipse. The first byte I got is -17, it is 0xef in binary, which is an invalid BOM starting byte. FYI, BOM should be 0xff 0xfe, or 0xfe 0xff... so it fails me in the parsing routine...

private static byte[] getDataFromFile(String filePath) throws IOException {
        File file = new File(filePath);
        FileInputStream fileInputStream = new FileInputStream(file);
        byte[] byteArray = new byte[(int) file.length()];
        //byteArray[0]=(byte)0xff;
        //byteArray[1]=(byte)0xfe;
        //byteArray[2]=0x00;

        fileInputStream.read(byteArray);

        System.out.println(" first byte "+byteArray[0]);
        System.out.println(" second byte "+byteArray[1]);
        System.out.println(" third byte "+byteArray[2]);
        System.out.println(" length "+file.length());
        return byteArray;
    }

the exception log looks like:

 first byte -17
 second byte -69
 third byte -65
 length 192
com.ximpleware.ParseException: XML decl error: Can't switch encoding to UTF-16
Line Number: 1 Offset: 39
    at com.ximpleware.VTDGen.matchUTFEncoding(VTDGen.java:2241)
    at com.ximpleware.VTDGen.process_dec_attr(VTDGen.java:3385)
    at com.ximpleware.VTDGen.parse(VTDGen.java:2632)
    at DOMTest.removeNode.cutOffXmlByXpath(removeNode.java:28)
    at DOMTest.removeNode.main(removeNode.java:46)
vtd-xml-author
  • 3,319
  • 4
  • 22
  • 30
  • Yes, You are getting the exact same error. I am using 2.11. And I am also using the same file. Thanks. – Saad galib Mar 24 '16 at 08:22
  • Sorry, my mistake. Please modify the getDataFromFile() method like this to reproduce the exception. `private static byte[] getDataFromFile(String filePath) throws IOException { File file = new File(filePath); FileInputStream fileInputStream = new FileInputStream(file); byte[] byteArray = new byte[(int) file.length()]; fileInputStream.read(byteArray); return byteArray; }` – Saad galib Mar 24 '16 at 10:54
  • I am getting the first byte to be -2 if I print the 0th index of the byte array. Any clue? – Saad galib Mar 26 '16 at 12:16
  • how big is your file? 192 bytes long? I printed the first three bytes and the length of the file... and can you verify the file you sent me or resent the file? – vtd-xml-author Mar 26 '16 at 22:22
  • I have sent you the file again. The file should be 380 bytes long. I am getting the first three bytes to be -2, -1 and 0. Let me know what you find. Thanks. – Saad galib Mar 27 '16 at 03:59