0

I need to read all processing instructions with NAME="CONTENTTYPE" and I want to read @VALUE and concatenate all the Values and return in XQuery/XPath.

My XML:

<REG >
    <MARKER MRKEID="SLREG:7.1" MRKTYPE="LD DU" MRKDATE="20130909" MRKTIME="10402688"/>
    <?METADATA NAME="CONTENTTYPE" VALUE="STATUTE"?>
    <?METADATA NAME="CONTENTTYPE" VALUE="LEGISLATIVEDOCUMENT"?>
    <?METADATA NAME="CONTENTTYPE" VALUE="PRIMARYSOURCE"?>
    <?METADATA NAME="SLTAXTYPE" VALUE="PRIMARYSOURCE"?>
</REG>

ExpectedOutput:

STATUTE
LEGISLATIVEDOCUMENT
PRIMARYSOURCE

Appreciate your help in writing the XQuery/XPath to get the output as above.

Thanks in Advance.

Regards, Hari

Bart
  • 9,925
  • 7
  • 47
  • 64
  • The name of those processing instructions is `METADATA` and e.g. `NAME="CONTENTTYPE" VALUE="STATUTE"` is unstructured data you would need to parse with your own code. `@VALUE` is not going to work, it selects an attribute of that name but only element nodes have attributes, processing instructions not. – Martin Honnen Oct 25 '13 at 10:02

2 Answers2

0

//processing-instruction('METADATA')[matches(., 'NAME="CONTENTTYPE" VALUE="[^"]*"')]/replace(substring-after(., 'VALUE="'), '"', ''). That's XPath 2.0.

Martin Honnen
  • 160,499
  • 6
  • 90
  • 110
  • Hi Martin, Thanks for the reply, but it is only returning first value ie: STATUTE. Please let me know how can I get all three values. – user2919291 Oct 25 '13 at 10:12
  • It should return a sequence with three string values for the input sample you posted. If you only get one value then your XPath processor or API is hiding part of the result. How do you evaluate the XPath? – Martin Honnen Oct 25 '13 at 10:33
  • Hi Martin, Please let me know can we get the same output using for loop. – user2919291 Oct 25 '13 at 11:53
  • @user2919291, which XPath API do you use? – Martin Honnen Oct 25 '13 at 11:57
  • Hi Martin, We are using JDOM. – user2919291 Oct 25 '13 at 12:17
  • @user2919291, tag your question as jdom and hopefully someone else than will be able to help using that API. It looks like doing `xpath.diagnose(context, false).getRawResults()` instead of the `xpath.evaluate(context)` should work but I don't code with Java and JDOM and a mere reading of the API docs is probably not the best approach to show you working code. So tag your question as jdom, then hopefully someone else can help with concrete code to evaluate the posted XPath with JDOM so that all items in the sequence of strings are returned. – Martin Honnen Oct 25 '13 at 12:41
0

Tagging with JDOM helped me find this.

Long answer coming.... XPath does not have the native ability to parse the 'standard' way of adding 'attributes' to ProcessingInstructions. If you want to do the concatenation of the values as part of a single XPath expression I think you are out of luck.... actually, Martin's answer looks promising, but it will return a number of String values, not ProcessingInsructions. JDOM 2.x will need a Filters.string() on the XPath.compile(...) and you will get a List<String> result to path.evaluate(doc).... I think it's simpler to do it outside of the XPath. Especially given that there's only limited support for XPath2.0 by using the Saxon library with JDOM 2.x

As for doing it programmatically, JDOM 2.x helps a fair amount. Taking your example XML I did it two ways, the first way uses a custom Filter on the XPath resultset. The second way does effectively the same thing but restricting the PI's further in the loop.

public static void main(String[] args) throws Exception {
    SAXBuilder saxb = new SAXBuilder();
    Document doc = saxb.build(new File("data.xml"));

    // This custom filter will return PI's that have the NAME="CONTENTTYPE" 'pseudo' attribute...
    @SuppressWarnings("serial")
    Filter<ProcessingInstruction> contenttypefilter = new AbstractFilter<ProcessingInstruction>() {

        @Override
        public ProcessingInstruction filter(Object obj) {
            // because we know the XPath expression selects Processing Instructions
            // we can safely cast here:
            ProcessingInstruction pi = (ProcessingInstruction)obj;
            if ("CONTENTTYPE".equals(pi.getPseudoAttributeValue("NAME"))) {
                return pi;
            }
            return null;
        }

    };

    XPathExpression<ProcessingInstruction> xp = XPathFactory.instance().compile(
            // search for all METADATA PI's.
            "//processing-instruction('METADATA')",
            // The XPath will return ProcessingInstruction content, which we
            // refine with our custom filter.
            contenttypefilter);

    StringBuilder sb = new StringBuilder();
    for (ProcessingInstruction pi : xp.evaluate(doc)) {
        sb.append(pi.getPseudoAttributeValue("VALUE")).append("\n");
    }
    System.out.println(sb);
}

This second way uses the simpler and pre-defined Filters.processingInstruction() but then does the additional filtering manually....

public static void main(String[] args) throws Exception {
    SAXBuilder saxb = new SAXBuilder();
    Document doc = saxb.build(new File("data.xml"));

    XPathExpression<ProcessingInstruction> xp = XPathFactory.instance().compile(
            // search for all METADATA PI's.
            "//processing-instruction('METADATA')",
            // Use the pre-defined filter to set the generic type
            Filters.processinginstruction());

    StringBuilder sb = new StringBuilder();
    for (ProcessingInstruction pi : xp.evaluate(doc)) {
        if (!"CONTENTTYPE".equals(pi.getPseudoAttributeValue("NAME"))) {
            continue;
        }
        sb.append(pi.getPseudoAttributeValue("VALUE")).append("\n");
    }
    System.out.println(sb);
}
rolfl
  • 17,539
  • 7
  • 42
  • 76