1

I have XML data (GraphML) i need to transform for my application. The XML represents a graph, that has nodes of labels "User" and "Item", and edges of label "HAS_HOBBY" and "FRIEND_OF".

Given a specific user, I want to get after the transform all his friends that share at least one hobby with him, and those hobbies (represented by items). "friends" are represented by "FRIEND_OF" edge element, and hobbies by "HAS_HOBBY".

I have my XSLT (i'm kinda new at this) that can find the items needed and the friends, however in my logic i cant manage to copy a friend just once - it is done once for every hobby he shares with the original user. I do this by going over each of the friend's hobbies for each of the user's hobbies, and when there's a match - i print the item (hobby) (which is okay), and the friend - however this friend is printed every time a match is found, resulting in multiple occurrences of this friend, which is undesired.

I tried searching for ways to avoid this, but i think my entire logic is flawed implementing this solution. I have no other ideas, though.

Here's my XSL:

<?xml version="1.0"?>
<xsl:stylesheet version="1.0"
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:ns="http://graphml.graphdrawing.org/xmlns"
    xmlns="http://graphml.graphdrawing.org/xmlns"
    exclude-result-prefixes="ns #default">
  <xsl:strip-space elements="*"/>
  <xsl:output indent="yes"/>



  <!--Identity template: default copy all content into the output -->
  <xsl:template match="node()|@*">
    <xsl:copy>
      <xsl:apply-templates select="node()|@*"/>
    </xsl:copy>
  </xsl:template>

  <!-- Don't copy tags called 'node or edge' -->
  <xsl:template match="ns:node" />
  <xsl:template match="ns:edge" />



  <xsl:template match="ns:node[ns:data[@key='username' and . = 'c']]">
    <xsl:copy>
      <xsl:apply-templates select="node()|@*"/>
    </xsl:copy>

    <xsl:variable name="USERID" select="@id"/>

    <xsl:for-each select="//ns:edge"> 

      <xsl:if test="@source=$USERID">

        <xsl:variable name="TARGET" select="@target"/>
        <xsl:for-each select="//ns:node[@id=$TARGET]">
          <!-- finds USERNAME's hobbies -->

          <xsl:for-each select="//ns:edge[@source=$USERID and @label='HAS_HOBBY']">
            <xsl:variable name="HOBBYTARGET" select="@target"/>
            <xsl:for-each select="//ns:edge[@source=$TARGET and @label='HAS_HOBBY']">
              <xsl:if test="@target=$HOBBYTARGET">
                <!-- Shared hobby with friend -->
                <xsl:for-each select="//ns:node[@id=$HOBBYTARGET]">
                  <xsl:copy>
                    <xsl:apply-templates select="node()|@*"/>
                  </xsl:copy>
                </xsl:for-each>


              </xsl:if>
            </xsl:for-each>  
          </xsl:for-each>
        </xsl:for-each>
      </xsl:if>

      <xsl:if test="@target=$USERID">

        <xsl:variable name="SOURCE" select="@source"/>
        <xsl:for-each select="//ns:node[@id=$SOURCE]">
          <!-- finds USERNAME's hobbies -->

          <xsl:for-each select="//ns:edge[@source=$USERID and @label='HAS_HOBBY']">
            <xsl:variable name="HOBBYTARGET" select="@target"/>
            <xsl:for-each select="//ns:edge[@source=$SOURCE and @label='HAS_HOBBY']">
              <xsl:if test="@target=$HOBBYTARGET">
                <!-- Shared hobby with friend -->
                <xsl:for-each select="//ns:node[@id=$HOBBYTARGET]">
                  <xsl:copy>
                    <xsl:apply-templates select="node()|@*"/>
                  </xsl:copy>
                </xsl:for-each>


              </xsl:if>
            </xsl:for-each>  
          </xsl:for-each>

        </xsl:for-each>
      </xsl:if>
    </xsl:for-each>

  </xsl:template>

</xsl:stylesheet>

At the moment the friend's copy is missing but it would be right after the "Shared hobby with friend" comment.

I realised i cant use a 'flag' type variable (since its not possible..) and there's no way to have arrays or some similar data structure, so im really out of ideas.

Please, help me to get a user's friends that he shares atleast one hobby (item) with, and the hobbies themselves.

EDIT: Sample Input: I added graph visualisation as well so its easy to see

enter image description here

<?xml version="1.0" encoding="UTF-8"?>
<graphml xmlns="http://graphml.graphdrawing.org/xmlns" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://graphml.graphdrawing.org/xmlns http://graphml.graphdrawing.org/xmlns/1.0/graphml.xsd">
<graph id="G" edgedefault="directed">

<node id="n2" labels=":Item"><data key="labels">:Item</data><data key="itemId">Q1</data></node>
<node id="n32" labels=":Item"><data key="labels">:Item</data><data key="itemId">Q8</data></node>
<node id="n51" labels=":Item"><data key="labels">:Item</data><data key="itemId">Q23</data></node>
<node id="n897" labels=":Item"><data key="labels">:Item</data><data key="itemId">Q55</data></node>

<node id="n406727" labels=":User"><data key="labels">:User</data><data key="hobbies">[Ljava.lang.String;@78ba00a3</data><data key="firstName">a</data><data key="imgPath">/uploads/a.png</data><data key="surName">a</data><data key="username">a</data><data key="gender">Male</data><data key="relaStatus">Single</data></node>
<node id="n406729" labels=":User"><data key="labels">:User</data><data key="hobbies"></data><data key="firstName">b</data><data key="imgPath">/uploads/b.png</data><data key="surName">b</data><data key="username">b</data><data key="gender">Male</data><data key="relaStatus">Single</data></node>
<node id="n406731" labels=":User"><data key="labels">:User</data><data key="hobbies"></data><data key="blocked">[Ljava.lang.String;@7b800b40</data><data key="firstName">c</data><data key="imgPath">/uploads/c.png</data><data key="surName">c</data><data key="username">c</data><data key="gender">Male</data><data key="relaStatus">Single</data></node>
<node id="n406734" labels=":User"><data key="labels">:User</data><data key="hobbies"></data><data key="firstName">d</data><data key="imgPath">/uploads/d.png</data><data key="surName">d</data><data key="username">d</data><data key="gender">Male</data><data key="relaStatus">Single</data></node>

<edge id="e1223400" source="n406727" target="n406729" label="FRIEND_OF"><data key="label">FRIEND_OF</data></edge>
<edge id="e1223403" source="n406727" target="n406731" label="FRIEND_OF"><data key="label">FRIEND_OF</data></edge>
<edge id="e1223405" source="n406734" target="n406731" label="FRIEND_OF"><data key="label">FRIEND_OF</data></edge>
<edge id="e1223405" source="n406727" target="n406734" label="FRIEND_OF"><data key="label">FRIEND_OF</data></edge>

<edge id="e1223374" source="n406727" target="n2" label="HAS_HOBBY"><data key="label">HAS_HOBBY</data></edge>
<edge id="e1223385" source="n406727" target="n51" label="HAS_HOBBY"><data key="label">HAS_HOBBY</data></edge>
<edge id="e1223383" source="n406729" target="n2" label="HAS_HOBBY"><data key="label">HAS_HOBBY</data></edge>
<edge id="e1223384" source="n406731" target="n2" label="HAS_HOBBY"><data key="label">HAS_HOBBY</data></edge>
<edge id="e1223375" source="n406731" target="n51" label="HAS_HOBBY"><data key="label">HAS_HOBBY</data></edge>
<edge id="e1223371" source="n406734" target="n897" label="HAS_HOBBY"><data key="label">HAS_HOBBY</data></edge>

</graph>
</graphml>

And here's the sample output. You can see that only c and b are left in the result since they have common hobbies (items with Q) with a. so d, the edge a-d and Q51, Q8 are gone.

enter image description here

<?xml version="1.0" encoding="UTF-8"?>
<graphml xmlns="http://graphml.graphdrawing.org/xmlns" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://graphml.graphdrawing.org/xmlns http://graphml.graphdrawing.org/xmlns/1.0/graphml.xsd">
<graph id="G" edgedefault="directed">

<node id="n2" labels=":Item"><data key="labels">:Item</data><data key="itemId">Q1</data></node>
<node id="n51" labels=":Item"><data key="labels">:Item</data><data key="itemId">Q23</data></node>

<node id="n406727" labels=":User"><data key="labels">:User</data><data key="hobbies">[Ljava.lang.String;@78ba00a3</data><data key="firstName">a</data><data key="imgPath">/uploads/a.png</data><data key="surName">a</data><data key="username">a</data><data key="gender">Male</data><data key="relaStatus">Single</data></node>
<node id="n406729" labels=":User"><data key="labels">:User</data><data key="hobbies"></data><data key="firstName">b</data><data key="imgPath">/uploads/b.png</data><data key="surName">b</data><data key="username">b</data><data key="gender">Male</data><data key="relaStatus">Single</data></node>
<node id="n406731" labels=":User"><data key="labels">:User</data><data key="hobbies"></data><data key="blocked">[Ljava.lang.String;@7b800b40</data><data key="firstName">c</data><data key="imgPath">/uploads/c.png</data><data key="surName">c</data><data key="username">c</data><data key="gender">Male</data><data key="relaStatus">Single</data></node>

<edge id="e1223400" source="n406727" target="n406729" label="FRIEND_OF"><data key="label">FRIEND_OF</data></edge>
<edge id="e1223403" source="n406727" target="n406731" label="FRIEND_OF"><data key="label">FRIEND_OF</data></edge>
<edge id="e1223405" source="n406734" target="n406731" label="FRIEND_OF"><data key="label">FRIEND_OF</data></edge>

<edge id="e1223374" source="n406727" target="n2" label="HAS_HOBBY"><data key="label">HAS_HOBBY</data></edge>
<edge id="e1223385" source="n406727" target="n51" label="HAS_HOBBY"><data key="label">HAS_HOBBY</data></edge>
<edge id="e1223383" source="n406729" target="n2" label="HAS_HOBBY"><data key="label">HAS_HOBBY</data></edge>
<edge id="e1223384" source="n406731" target="n2" label="HAS_HOBBY"><data key="label">HAS_HOBBY</data></edge>
<edge id="e1223375" source="n406731" target="n51" label="HAS_HOBBY"><data key="label">HAS_HOBBY</data></edge>

</graph>
</graphml>

Thank you for your time.

Edit#2: Added data for label nodes and hasLabel edges:

<node id="n3" labels=":Label"><data key="labels">:Label</data><data key="en-gb">Universe</data>
<edge id="e0" source="n2" target="n3" label="hasLabel"><data key="label">hasLabel</data></edge>

This edge connects the node n2 which has the itemId of Q1 to the node n3 which has its label, "Universe".

Zephyer
  • 333
  • 6
  • 16

2 Answers2

1

Q: * .. all his friends that share at least one hobby with him ..*
Here a first possibility to do that.

Create a variable with all hobby edges for user id.:

<xsl:variable name="hobbies" select="//ns:edge[@source=$USERID and @label='HAS_HOBBY']"/>

Same for all friends (edges):

<xsl:variable name="friends" select="//ns:edge[@target=$USERID and @label='FRIEND_OF']"/>

Than friends with same hobby would be :

<xsl:variable name="friends_with_bobby"
   select="$friends[ //ns:edge[ @label='HAS_HOBBY'  and 
         @target = $hobbies/@target]/@source=./@source   ]"/>

To test this try:

<xsl:template match="ns:node[ns:data[@key='username' and . = 'c']]">
    <xsl:copy>
        <xsl:apply-templates select="node()|@*"/>
    </xsl:copy>

    <xsl:variable name="USERID" select="@id"/>

    <xsl:variable name="hobbies" select="//ns:edge[@source=$USERID and @label='HAS_HOBBY']"/>
    <xsl:variable name="friends" select="//ns:edge[@target=$USERID and @label='FRIEND_OF']"/>
    <xsl:variable name="friends_with_bobby" select="$friends[ //ns:edge[ @label='HAS_HOBBY'  and @target = $hobbies/@target]/@source=./@source   ]"/>
    <hobbies>
        <xsl:copy-of select="$hobbies" />
    </hobbies>
    <friends>
        <xsl:copy-of select="$friends" />
    </friends>
    <friends_with_bobby>
        <xsl:copy-of select="$friends_with_bobby" />
    </friends_with_bobby>
</xsl:template>

This are only the edges, but should be easy to adapt to your requested output. (otherwise let me know)

Update: To get all user with same hobby (not necessary a friend) try :

    <xsl:variable name="shared_hobby" select="//ns:edge[ @label='HAS_HOBBY'  and @target = $hobbies/@target]"/>
    <xsl:variable name="n_user_shared_hobby" select="//ns:node[ns:data[@key='username'] and @id=$shared_hobby/@source]"/>
hr_117
  • 9,589
  • 1
  • 18
  • 23
1

Here is an example using XSLT 2.0 (as supported by Saxon 9, XmlPrime, Altova, Exselt) using keys to reference the items and then set operations like intersect to only output shared nodes:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:xs="http://www.w3.org/2001/XMLSchema"
    exclude-result-prefixes="xs"
    xpath-default-namespace="http://graphml.graphdrawing.org/xmlns"
    version="2.0">

<xsl:param name="user-name" as="xs:string" select="'c'"/>

<xsl:output indent="yes"/>

<xsl:key name="user-name" match="node[@labels = ':User']" use="data[@key = 'username']"/>

<xsl:key name="node-id" match="node" use="@id"/>

<xsl:key name="source-friends" match="edge[@label = 'FRIEND_OF']" use="@source"/>
<xsl:key name="target-friends" match="edge[@label = 'FRIEND_OF']" use="@target"/>
<xsl:key name="source-hobbies" match="edge[@label = 'HAS_HOBBY']" use="@source"/>

<xsl:variable name="start-node" select="key('user-name', $user-name)"/>

<xsl:variable name="start-friends"
               select="key('node-id', key('source-friends', $start-node/@id)/@target) |
                       key('node-id', key('target-friends', $start-node/@id)/@source)"/>

<xsl:variable name="start-hobbies" select="key('node-id', key('source-hobbies', $start-node/@id)/@target)"/>

<xsl:variable name="friends-with-shared-hobby" select="$start-friends[key('node-id', key('source-hobbies', @id)/@target) intersect $start-hobbies]"/>

<xsl:variable name="shared-hobbies" select="$start-hobbies intersect key('node-id', key('source-hobbies', $friends-with-shared-hobby/@id)/@target)"/>

<xsl:template match="/*">
    <xsl:copy>
        <xsl:copy-of select="$start-node | $friends-with-shared-hobby | $shared-hobbies"/>
    </xsl:copy>
</xsl:template>

</xsl:stylesheet>
Martin Honnen
  • 160,499
  • 6
  • 90
  • 110
  • Thanks Martin - this seems to work for 'a'! however when i provide user-name='b' or user-name='c' i only get the user element 'b' or 'c' without hobbies, and without 'a' either (which shares a hobby with b and c). I use saxon 9. I think its because you take into consideration here "" only friends that are in the source. but friends can also be in the target of the FRIENDS_OF edge. can you please modify the solution to include that? thanks alot. – Zephyer Mar 30 '16 at 11:59
  • for example, this edge - a is n406727, and b is n406729. so from a's perspective, b is a friend in the 'target' edge, but from b's perspective, a is a friend in the 'source' of the edge. i believe that's what is lacking to be a complete solution – Zephyer Mar 30 '16 at 12:01
  • @Zephyer, I have corrected the code sample and added a further key to traverse the friend references in both directions. – Martin Honnen Mar 30 '16 at 12:46
  • Martin, thanks alot. this is amazing how elegant the solution is. I marked the answer. However, if you dont mind - lets assume i also want to add the FRIEND_OF edges between the user and his friends (that he shares a hobby with). If you dont mind, please update the solution to have that as well. Thanks again! – Zephyer Mar 30 '16 at 12:49
  • 1
    I think one way using the defined keys is to select `` and then copy them to the output in ``. – Martin Honnen Mar 30 '16 at 13:47
  • Martin, thank you so much! I have one last request - I tried to use what you've shown me to extend the functionality - each of those Q items is connected to another node with its label (because Q is just a code). I would love if you could show me how to add those edges and label nodes as well. i edited main post with example data for those new nodes and edges. Thanks again! Data is up now at the end of the main post. – Zephyer Mar 30 '16 at 14:25
  • Anyone else can also attempt this last request of mine, i'm stuck trying to study how Martin did his magic in order to get that last thing done – Zephyer Mar 30 '16 at 15:57