1

I receive an XML via a web service and I am using legacy code (which uses dom4j) to perform some xml transformation. Loading/parsing the original XML into VTD-XML (VTDGen) works fine, no exceptions thrown. However, after loading the xml into dom4j, I noticed some of the element namespace declarations and attributes are re-arranged. Apparently, this re-arrangement causes VTD-XML to throw the following exception:

Exception: Name space qualification Exception: prefixed attribute not qualified

Line Number: 101 Offset: 1827

Here is the element at this line number in the original XML:

<RR_PerformanceSite:PerformanceSite_1_4 RR_PerformanceSite:FormVersion="1.4" xmlns:NSF_ApplicationChecklist="http://apply.grants.gov/forms/NSF_ApplicationChecklist-V1.1" xmlns:NSF_CoverPage="http://apply.grants.gov/forms/NSF_CoverPage-V1.1" xmlns:NSF_DeviationAuthorization="http://apply.grants.gov/forms/NSF_DeviationAuthorization-V1.1" xmlns:NSF_Registration="http://apply.grants.gov/forms/NSF_Registration-V1.1" xmlns:NSF_SuggestedReviewers="http://apply.grants.gov/forms/NSF_SuggestedReviewers-V1.1" xmlns:PHS398_CareerDevelopmentAwardSup="http://apply.grants.gov/forms/PHS398_CareerDevelopmentAwardSup_1_1-V1.1" xmlns:PHS398_Checklist="http://apply.grants.gov/forms/PHS398_Checklist_1_3-V1.3" xmlns:PHS398_CoverPageSupplement="http://apply.grants.gov/forms/PHS398_CoverPageSupplement_1_4-V1.4" xmlns:PHS398_ModularBudget="http://apply.grants.gov/forms/PHS398_ModularBudget-V1.1" xmlns:PHS398_ResearchPlan="http://apply.grants.gov/forms/PHS398_ResearchPlan_1_3-V1.3" xmlns:PHS_CoverLetter="http://apply.grants.gov/forms/PHS_CoverLetter_1_2-V1.2" xmlns:RR_Budget="http://apply.grants.gov/forms/RR_Budget-V1.1" xmlns:RR_KeyPersonExpanded="http://apply.grants.gov/forms/RR_KeyPersonExpanded_1_2-V1.2" xmlns:RR_OtherProjectInfo="http://apply.grants.gov/forms/RR_OtherProjectInfo_1_2-V1.2" xmlns:RR_PerformanceSite="http://apply.grants.gov/forms/PerformanceSite_1_4-V1.4" xmlns:RR_PersonalData="http://apply.grants.gov/forms/RR_PersonalData-V1.1" xmlns:RR_SF424="http://apply.grants.gov/forms/RR_SF424_1_2-V1.2" xmlns:RR_SubawardBudget="http://apply.grants.gov/forms/RR_SubawardBudget-V1.2" xmlns:SF424C="http://apply.grants.gov/forms/SF424C-V1.0" xmlns:att="http://apply.grants.gov/system/Attachments-V1.0" xmlns:codes="http://apply.grants.gov/system/UniversalCodes-V2.0" xmlns:globlib="http://apply.grants.gov/system/GlobalLibrary-V2.0">

Here is the same element after loaded into dom4j:

<RR_PerformanceSite:PerformanceSite_1_4 xmlns:RR_PerformanceSite="http://apply.grants.gov/forms/PerformanceSite_1_4-V1.4" xmlns:NSF_ApplicationChecklist="http://apply.grants.gov/forms/NSF_ApplicationChecklist-V1.1" xmlns:NSF_CoverPage="http://apply.grants.gov/forms/NSF_CoverPage-V1.1" xmlns:NSF_DeviationAuthorization="http://apply.grants.gov/forms/NSF_DeviationAuthorization-V1.1" xmlns:NSF_Registration="http://apply.grants.gov/forms/NSF_Registration-V1.1" xmlns:NSF_SuggestedReviewers="http://apply.grants.gov/forms/NSF_SuggestedReviewers-V1.1" xmlns:PHS398_CareerDevelopmentAwardSup="http://apply.grants.gov/forms/PHS398_CareerDevelopmentAwardSup_1_1-V1.1" xmlns:PHS398_Checklist="http://apply.grants.gov/forms/PHS398_Checklist_1_3-V1.3" xmlns:PHS398_CoverPageSupplement="http://apply.grants.gov/forms/PHS398_CoverPageSupplement_1_4-V1.4" xmlns:PHS398_ModularBudget="http://apply.grants.gov/forms/PHS398_ModularBudget-V1.1" xmlns:PHS398_ResearchPlan="http://apply.grants.gov/forms/PHS398_ResearchPlan_1_3-V1.3" xmlns:PHS_CoverLetter="http://apply.grants.gov/forms/PHS_CoverLetter_1_2-V1.2" xmlns:RR_Budget="http://apply.grants.gov/forms/RR_Budget-V1.1" xmlns:RR_KeyPersonExpanded="http://apply.grants.gov/forms/RR_KeyPersonExpanded_1_2-V1.2" xmlns:RR_OtherProjectInfo="http://apply.grants.gov/forms/RR_OtherProjectInfo_1_2-V1.2" xmlns:RR_PersonalData="http://apply.grants.gov/forms/RR_PersonalData-V1.1" xmlns:RR_SF424="http://apply.grants.gov/forms/RR_SF424_1_2-V1.2" xmlns:RR_SubawardBudget="http://apply.grants.gov/forms/RR_SubawardBudget-V1.2" xmlns:SF424C="http://apply.grants.gov/forms/SF424C-V1.0" xmlns:att="http://apply.grants.gov/system/Attachments-V1.0" xmlns:codes="http://apply.grants.gov/system/UniversalCodes-V2.0" xmlns:globlib="http://apply.grants.gov/system/GlobalLibrary-V2.0" RR_PerformanceSite:FormVersion="1.4">

The problem is regarding the attribute (at offset 1827, at the end of the element) in the new XML element: RR_PerformanceSite:FormVersion="1.4"

Here is what removes the exception: 1. Adding the RR_PerformanceSite xmlns declaration for this element to the root element of the XML doc. 2. Replacing new element with original element. This SEEMS to lead me to believe that the order of the attributes/ns declarations affects VTD when parsing.

NOTE: I parse the xml doc setting ns aware to 'true' with both xml docs (original and post-dom4j xml). Also, new VTD objects are created for each xml, original and post-dom4j.

I tried to put 'RR_PerformanceSite:FormVersion="1.4"' at the beginning of the element like the original but that does not remove the exception. The offset in the error message is different due to the change of location of the attribute. Does the order of the xmlns declarations affect VTD?

I have looked at the VTDGen source code and cannot figure out why this exception is being thrown.

Why would dom4j parse the new doc and vtd is unable to? Can anyone can shed some light on this?

user1113792
  • 91
  • 1
  • 7

1 Answers1

2

It appears to be a bug on VTD-XML, related with namespace declaration order.

Always reproducible using the following Java code

public class SchemaTester {

    /**
     * @param args
     */
    public static void main(String[] args) throws Exception {

        String bad = "C:/Temp/VTD_bad.xml"; // XML files to test
        String good = "C:/Temp/VTD_good.xml";

        StringBuilder sb = new StringBuilder();

        char[] buf = new char[4*1024];
        FileReader fr = new FileReader(bad);
        int readed = 0;

        while ((readed = fr.read(buf, 0, buf.length)) != -1) {
            sb.append(buf, 0, readed);
        }

        fr.close();

        String x = sb.toString();

        //instantiate VTDGen
        //and call parse 
        VTDGen vg = new VTDGen();
        vg.setDoc(x.getBytes("UTF-8"));
        vg.parse(true);  // set namespace awareness to true
        VTDNav vn = vg.getNav();



        AutoPilot ap = new AutoPilot (vn);
        ap.selectXPath("//*/@*");

        int i= -1;
        while((i=ap.evalXPath()) != -1) {
            // i will be attr name, i+1 will be attribute value
            System.out.println("\t\tAttribute ==> " + vn.toNormalizedString(i));
            System.out.println("\t\tValue ==> " + vn.toNormalizedString(i+1));
        } 

    }
}

The OP has uploaded the XML to https://gist.github.com/2696220

MrJames
  • 676
  • 2
  • 8
  • 20
  • the element snippets are the opening elements, I did not want to paste the entire XML. The XML document is well formed. I will try validating at w3schools, however, it is fine with dom4j ... it is VTD that is having the issue with the post-dom4j xml. – user1113792 May 11 '12 at 05:17
  • Could you please post some Java code of how are you parsing the doc? – MrJames May 11 '12 at 13:25
  • here is a snippet of the code that loads the bytes into VTD: VTDGen vGen = new VTDGen(); vGen.setDoc( submissionXmlBytes ); vGen.parse( true ); vNav = vGen.getNav(); – user1113792 May 11 '12 at 17:15
  • Try using getBytes(...) with the right encoding. Also what's the JDK version you are using? – MrJames May 11 '12 at 18:08
  • Hum... How can you run VTD-XML on jdk 1.5? The last versions of VTD-XML (2.9 and 2.10) only supports Java >= 6. What version of VTD-XML are you using? – MrJames May 12 '12 at 11:43
  • It's possible to recompile VTD-XML with other JDK out of the box, just edit `build.bat`. Ok I was able to use version 2.10 with JDK 1.5 and obtained the same console output. – MrJames May 12 '12 at 12:00
  • How can I upload two sample xml files? I have one pre-dom4j ... that loads successfully into VTD ... and one post-dom4j that does not load into VTD. I think working only with the element does not give you a true use-case. Your sample code above, works with and without the 'getBytes("UTF-8")'. – user1113792 May 14 '12 at 19:45
  • git://gist.github.com/2696220.git – user1113792 May 14 '12 at 19:59
  • Were you able to get to the files? – user1113792 May 14 '12 at 20:11
  • Yes, I can confirm this with your bad XML. – MrJames May 17 '12 at 14:14
  • I think this should be addressed to VTD-XML developers, it appears to be a bug, because if we change the 'bad' XML and put the namespace definition `xmlns:RR_PerformanceSite="http://apply.grants.gov/forms/PerformanceSite_1_4-V1.4"` as the last element attribute it works! – MrJames May 17 '12 at 14:52
  • GREAT! How can I address this issue to the VTD-XML developers??? I have been all over the sourceforge site and cannot figure out how to get their attention with this issue. This is causing an issue for some users and I do NOT want to roll back to dom4j or jdom. You have any suggestions? Also, thx for your time ... I was thinking I was the only one seeing this issue. – user1113792 May 17 '12 at 16:11
  • You can add a topic on their VTD-XML sourceforge forum... you will need to register on sourceforge. https://sourceforge.net/projects/vtd-xml/forums/forum/379067 – MrJames May 17 '12 at 16:17
  • This is indeed a bug that has been fixed ... check out from cvs the following two files (URL included) FastIntBuffer.java http://vtd-xml.cvs.sourceforge.net/viewvc/vtd-xml/ximple-dev/com/ximpleware/FastLongBuffer.java FastLongBuffer.java http://vtd-xml.cvs.sourceforge.net/viewvc/vtd-xml/ximple-dev/com/ximpleware/FastIntBuffer.java Recompile the whole package with the script included in the 2.10 distribution – user1113792 May 21 '12 at 15:55
  • Thanks to jzhang @ ximpleware for a quick turnaround with the fix! – user1113792 May 21 '12 at 15:56