0

I have a need to look up many values in one xml which looks like:

<cat catid="some_generic_text_followed_by_something_specific1" pid="x1">
</cat>
</cat><cat catid="some_generic_text_followed_by_something_specific1" pid="x2">
</cat>
<cat catid="some_generic_text_followed_by_something_specific2" pid="x3">
</cat>
<cat catid="some_generic_text_followed_by_something_specific2" pid="x4">
</cat>
<cat catid="some_generic_text_followed_by_something_specific3" pid="x5">
</cat>

So the task is to identify words like "specific1" and "specific2" and find all pid values that belong to these multiple keywords. In this case I find x1,x2,x3,x4 but not x5.

I then have to look up another xml with many nodes:

<prod prod-id="x1">
    <display-name xml:lang="x-default">some text</display-name>
</prod>
<prod prod-id="x2">
    <display-name xml:lang="x-default">some more text</display-name>
</prod>
<prod prod-id="x5">
    <display-name xml:lang="x-default">some text</display-name>
</prod>

and update in bulk the same text right before you see "some text" with "inserted keyword" followed by what was there. So the 1st example, it would say "inserted keyword some text". In essence, I'm prepending the text.

I can do any xslt version and will probably use some tool like XmlSpy or similar.

I did find a sort of similar question/answer here XSLT to lookup values in one XML and replace in another XML file but I don't understand xslt enough to make a modification for my example.

UPDATE

I have a minor correction to my very 1st xml above: It is actually:

<cat catid="c1">
   <parent>specific1</parent>
</cat>
<cat catid="c2">
   <parent>specific1</parent>
</cat>
<cat catid="c3">
   <parent>specific1</parent>
</cat>
<cat catid="c4">
   <parent>specific2</parent>
</cat>
<cat catid="c5">
   <parent>specific2</parent>
</cat>
<cat-assign catid="c1" pid="x13"/>
<cat-assign catid="c1" pid="x14"/>
<cat-assign catid="c1" pid="x15"/>
<cat-assign catid="c2" pid="x24"/>
<cat-assign catid="c2" pid="x43"/>
<cat-assign catid="c2" pid="x44"/>
<cat-assign catid="c3" pid="x45"/>
<cat-assign catid="c4" pid="x27"/>
<cat-assign catid="c5" pid="x31"/>
<cat-assign catid="c5" pid="x32"/>
<cat-assign catid="c5" pid="x33"/>
<cat-assign catid="c5" pid="x34"/>
  1. I need to look for an exact match "specific1" not using contains keyword
  2. Then find the catid (there will be multiple)
  3. In the same xml, find each
  4. finally look up the pid used to look up another xml doc
Community
  • 1
  • 1
TimJohnson
  • 923
  • 1
  • 9
  • 22

1 Answers1

2

Assuming XSLT 3.0 (which the current version of XMLSpy supports) you can use the following which assumes the document you want to manipulate is the primary input and the URI of the other document is set as the parameter cat-uri:

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:xs="http://www.w3.org/2001/XMLSchema"
    exclude-result-prefixes="xs"
    version="3.0">

    <xsl:param name="cat-uri" as="xs:string" select="'cat.xml'"/>

    <xsl:param name="new" as="xs:string" select="'inserted keyword'"/>

    <xsl:param name="word-list" as="xs:string*" select="'specific1', 'specific2'"/>

    <xsl:param name="cat-doc" select="doc($cat-uri)"/>

    <xsl:mode on-no-match="shallow-copy"/>

    <xsl:key name="match" match="cat" use="some $word in $word-list satisfies contains(@catid, $word)"/>

    <xsl:key name="ref" match="prod[@prod-id]/display-name" use="../@prod-id"/>

    <xsl:variable name="pids" select="key('match', true(), $cat-doc)/@pid"/>

    <xsl:template match="key('ref', $pids)/text()">
        <xsl:value-of select="$new || ' ' || ."/>
    </xsl:template>

</xsl:stylesheet>

As for your changed input, you would have to adapt the keys then, so to match the cat elements on the parent child element value you can declare a key <xsl:key name="cat-match" match="cat" use="parent"/> and then key('cat-match', $word-list, $cat-doc)/@catid gives us the catid attribute values of the cat-assign we need to reference. To do that, we can define another key <xsl:key name="cat-assign" match="cat-assign" use="@catid"/> and then key('cat-assign', key('cat-match', $word-list, $cat-doc)/@catid, $cat-doc)/@pid gives us the values to reference the prod elements in the primary input. The rest is unchanged:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:xs="http://www.w3.org/2001/XMLSchema"
    exclude-result-prefixes="xs"
    version="3.0">

    <xsl:param name="cat-uri" as="xs:string" select="'cat.xml'"/>

    <xsl:param name="new" as="xs:string" select="'inserted keyword'"/>

    <xsl:param name="word-list" as="xs:string*" select="'specific1', 'specific2'"/>

    <xsl:param name="cat-doc" select="doc($cat-uri)"/>

    <xsl:mode on-no-match="shallow-copy"/>

    <xsl:key name="cat-match" match="cat" use="parent"/>

    <xsl:key name="cat-assign" match="cat-assign" use="@catid"/>

    <xsl:key name="ref" match="prod[@prod-id]/display-name" use="../@prod-id"/>

    <xsl:variable name="pids" select="key('cat-assign', key('cat-match', $word-list, $cat-doc)/@catid, $cat-doc)/@pid"/>

    <xsl:template match="key('ref', $pids)/text()">
        <xsl:value-of select="$new || ' ' || ."/>
    </xsl:template>

</xsl:stylesheet>

When I run that inside Oxygen with Saxon 9.6 EE against the cat.xml being

<?xml version="1.0" encoding="UTF-8"?>
<root>
    <cat catid="c1">
        <parent>specific1</parent>
    </cat>
    <cat catid="c2">
        <parent>specific1</parent>
    </cat>
    <cat catid="c3">
        <parent>specific1</parent>
    </cat>
    <cat catid="c4">
        <parent>specific2</parent>
    </cat>
    <cat catid="c5">
        <parent>specific2</parent>
    </cat>
    <cat catid="c6">
        <parent>specific3</parent>
    </cat>
    <cat-assign catid="c1" pid="x13"/>
    <cat-assign catid="c1" pid="x1"/>
    <cat-assign catid="c1" pid="x15"/>
    <cat-assign catid="c2" pid="x24"/>
    <cat-assign catid="c2" pid="x43"/>
    <cat-assign catid="c2" pid="x44"/>
    <cat-assign catid="c3" pid="x45"/>
    <cat-assign catid="c4" pid="x27"/>
    <cat-assign catid="c5" pid="x31"/>
    <cat-assign catid="c5" pid="x2"/>
    <cat-assign catid="c5" pid="x33"/>
    <cat-assign catid="c5" pid="x34"/>
</root>

and the input document being

<?xml version="1.0" encoding="UTF-8"?>
<root>
    <prod prod-id="x1">
        <display-name xml:lang="x-default">some text</display-name>
    </prod>
    <prod prod-id="x2">
        <display-name xml:lang="x-default">some more text</display-name>
    </prod>
    <prod prod-id="x5">
        <display-name xml:lang="x-default">some text</display-name>
    </prod>
</root>

the result is

<?xml version="1.0" encoding="UTF-8"?><root>
    <prod prod-id="x1">
        <display-name xml:lang="x-default">inserted keyword some text</display-name>
    </prod>
    <prod prod-id="x2">
        <display-name xml:lang="x-default">inserted keyword some more text</display-name>
    </prod>
    <prod prod-id="x5">
        <display-name xml:lang="x-default">some text</display-name>
    </prod>
</root>
Martin Honnen
  • 160,499
  • 6
  • 90
  • 110
  • Sir, this solution is a total overkill for a beginner in [xslt] but fun to see ;) kind of impressive solution. **+1** – uL1 Oct 20 '16 at 16:14
  • the answer does look impressive. will try that now and provide feedback shortly – TimJohnson Oct 20 '16 at 17:45
  • @Martin, since I'm on mac, I was using OxygenXml and because my xml file is 250Mb, I'm running out of heap space. Need to figure out how to run this successfully – TimJohnson Oct 20 '16 at 18:55
  • Oxygen has XSLT 3.0 using Saxon 9.6 or 9.7 EE. As for the memory problems, check https://www.oxygenxml.com/doc/versions/18.0/ug-author/topics/set-parameters-for-application-launchers.html to give it more memory, I think with an XML file of 250 MB you can expect the transformation to at least need a 1GB. – Martin Honnen Oct 20 '16 at 19:00
  • @MartinHonnen the link above worked after increasing Xmx to 12 GB but the output xml is exactly the same as original. Looking into why... – TimJohnson Oct 20 '16 at 22:28
  • @MartinHonnen Hi, thank you so much again for your help, you are truly a master of xslt. I just updated my xml in the question which was the reason it was not doing anything. Could you let me know what it would look like with updated xslt? – TimJohnson Oct 20 '16 at 23:32
  • @MartinHonnen, again huge thanks, I think we're getting close. Although it didn't do anything, I think it's because the xml that I have is nested among many other xml elements. I changed match to //cat and //prod to no avail. I'll keep trying to see what I have different and mark your answer – TimJohnson Oct 21 '16 at 12:05
  • @MartinHonnen I changeed to: but still not doing anything – TimJohnson Oct 21 '16 at 12:13
  • @TimJohnson, putting a leading `//` in front of a match pattern is not changing anything. So those changes do not achieve any change. Does the suggested code work for you with the samples as shown? Which version of Oxygen is that, which version of Saxon 9 do you execute the code with in Oxygen? – Martin Honnen Oct 21 '16 at 12:17
  • @MartinHonnen my root had xmlns="my custom one" so it wasn't transforming. After removing it, it works fine. Thanks again. – TimJohnson Oct 22 '16 at 02:24
  • You don't have to remove a namespace declaration, you can define `xpath-default-namespace="my custom one"` in your XSLT to deal with XML in a namespace. It might be a bit more complicated if the two input documents have different namespaces but you would simply need to show that in your question to allow us to help with the right code. – Martin Honnen Oct 22 '16 at 09:40