1

I'm trying to get unique set of data from the XML below

<output>
  <category>DB</category>
  <title>Database systems</title>
  <name>Smith</name>
  <name>John</name>
  <name>Adam</name>
</output>
<output>
  <category>DB</category>
  <title>Database systems</title>
  <name>John</name>
  <name>Smith</name>
  <name>Adam</name>
</output>
<output>
  <category>DB</category>
  <title>Database systems</title>
  <name>Adam</name>
  <name>Smith</name>
  <name>John</name>
</output>
<output>
  <category>Others</category>
  <title>Pattern Recognition</title>
  <name>Adam</name>
  <name>Jeff</name>
</output>
<output>
  <category>Others</category>
  <title>Pattern Recognition</title>
  <name>Jeff</name>
  <name>Adam</name>
</output>

Since the 3 output blocks contain the same information, I only need to pick one. But, when I use distinct-values() function, I'm getting all three of them in their respective order.

I have assigned the above table as $final and below is what I'm getting

for $f in distinct-values($final)
return $f

output

DBDatabase systemsSmithJohnAdam
DBDatabase systemsJohnSmithAdam
DBDatabase systemsAdamSmithJohn

expected

<output>
  <category>DB</category>
  <title>Database systems</title>
  <name>Smith</name>
  <name>John</name>
  <name>Adam</name>
</output>
<output>
  <category>Others</category>
  <title>Pattern Recognition</title>
  <name>Adam</name>
  <name>Jeff</name>
</output>

no need for ordering in I tried to sort the name tag but its not working out as it adds too much to the code. Is there any logic in Xquery to get one copy from the above XML ?

rachithr
  • 23
  • 1
  • 5
  • 2
    The three `` elements are *structurally* different, so a simple `deep-equal(...)` does not work. When exactly to you consider two elements to contain "the same information"? What if a `` was duplicated in one of them? Do you only want to disregard ordering? – Leo Wörteler May 15 '20 at 09:56
  • If the 3 `` tags are "equivalent" to you, why not just grab the first (`[1]`) tag and its children? – Jack Fleeting May 15 '20 at 12:05
  • I consider two elements equal when the info in them is the same. Consider the above case as a textbook and authors. All three are the same. I can't take only first, because this is a sample among many other textbooks and the number of repetitions are different – rachithr May 15 '20 at 17:49
  • If that's the case, you may need to expand the sample xml in the question to show another case and how these cases relate to each other; for example, is the "same information" repeated sequentially (for example, 3 times in a row, as in your question) or can it be mingled with "same information" from another book/authors? – Jack Fleeting May 15 '20 at 18:17
  • Updated the Question. The "title" is key here. Need to know books and their authors. You can assume that there is no other entry for the same book with different sets of authors. – rachithr May 15 '20 at 18:29

3 Answers3

1

Try something along these lines on your actual xml:

let $inv :=
<doc>
 [your xml above]
</doc>
let $titles := $inv//output/title
for $title in distinct-values($titles)
return $inv//output[title[$title]][1]

Output:

<output>
  <category>DB</category>
  <title>Database systems</title>
  <name>Smith</name>
  <name>John</name>
  <name>Adam</name>
</output>
<output>
  <category>Others</category>
  <title>Pattern Recognition</title>
  <name>Adam</name>
  <name>Jeff</name>
</output>
Jack Fleeting
  • 24,385
  • 6
  • 23
  • 45
0

An option could be :

doc("data.xml")//output/*[not(preceding::*=.)]

Output :

<category>DB</category>
<title>Database systems</title>
<name>Smith</name>
<name>John</name>
<name>Adam</name>
E.Wiest
  • 5,425
  • 2
  • 7
  • 12
0

In XQuery 3, I think the shortest and most efficient is to use group by:

for $output in //output
group by $title := $output/title
return head($output)

https://xqueryfiddle.liberty-development.net/jyH9Xv5

Martin Honnen
  • 160,499
  • 6
  • 90
  • 110