0

I have a following XML:

<doc>
<ActivityNarrativeInformation>
  <ActivityID>123456789</ActivityID>
  <ActivityNarrativeInformationID>111111111</ActivityNarrativeInformationID>
  <ActivityNarrativeSequenceNumber>1</ActivityNarrativeSequenceNumber>
  <ActivityNarrativeText>She Sells Sea Shells by the Sea Shore and she also</ActivityNarrativeText>
  </ActivityNarrativeInformation>
 <ActivityNarrativeInformation>
  <ActivityID>123456789</ActivityID>
  <ActivityNarrativeInformationID>111111111</ActivityNarrativeInformationID>
  <ActivityNarrativeSequenceNumber>3</ActivityNarrativeSequenceNumber>
  <ActivityNarrativeText>triple shot frappuccino, extra hot, with whipped cream in a tall cup </ActivityNarrativeText>
</ActivityNarrativeInformation>
<ActivityNarrativeInformation>
  <ActivityID>123456789</ActivityID>
  <ActivityNarrativeInformationID>111111111</ActivityNarrativeInformationID>
  <ActivityNarrativeSequenceNumber>2</ActivityNarrativeSequenceNumber>
  <ActivityNarrativeText>likes to take long walks on the beach while she drinks a</ActivityNarrativeText>
  </ActivityNarrativeInformation>
<ActivityNarrativeInformation>
  <ActivityID>987654321</ActivityID>
  <ActivityNarrativeInformationID>222222222</ActivityNarrativeInformationID>
  <ActivityNarrativeSequenceNumber>486</ActivityNarrativeSequenceNumber>
  <ActivityNarrativeText>It was a dark and stormy night; the rain fell in torrents--except at occasional intervals, when
 </ActivityNarrativeText>
</ActivityNarrativeInformation>
<ActivityNarrativeInformation>
  <ActivityID>987654321</ActivityID>
  <ActivityNarrativeInformationID>222222222</ActivityNarrativeInformationID>
  <ActivityNarrativeSequenceNumber>488</ActivityNarrativeSequenceNumber>
  <ActivityNarrativeText>scene lies), rattling along the housetops, and fiercely agitating the scanty flame of the lamps that
</ActivityNarrativeText>
</ActivityNarrativeInformation>
<ActivityNarrativeInformation>
  <ActivityID>987654321</ActivityID>
  <ActivityNarrativeInformationID>222222222</ActivityNarrativeInformationID>
  <ActivityNarrativeSequenceNumber>487</ActivityNarrativeSequenceNumber>
  <ActivityNarrativeText>was checked by a violent gust of wind which swept up the streets (for it is in London that our
</ActivityNarrativeText>
</ActivityNarrativeInformation>
<ActivityNarrativeInformation>
  <ActivityID>987654321</ActivityID>
  <ActivityNarrativeInformationID>222222222</ActivityNarrativeInformationID>
  <ActivityNarrativeSequenceNumber>489</ActivityNarrativeSequenceNumber>
  <ActivityNarrativeText>struggled against the darkness.
</ActivityNarrativeText>
</ActivityNarrativeInformation>
<ActivityNarrativeInformation>
  <ActivityID>55555555</ActivityID>
  <ActivityNarrativeInformationID>77777777</ActivityNarrativeInformationID>
  <ActivityNarrativeSequenceNumber>31921</ActivityNarrativeSequenceNumber>
  <ActivityNarrativeText>Papa Bear was very big and growly. Mamma Bear was middle-sized and pleasant.
</ActivityNarrativeText>
</ActivityNarrativeInformation>
<ActivityNarrativeInformation>
  <ActivityID>55555555</ActivityID>
  <ActivityNarrativeInformationID>77777777</ActivityNarrativeInformationID>
  <ActivityNarrativeSequenceNumber>31923</ActivityNarrativeSequenceNumber>
  <ActivityNarrativeText>Papa bear loved to fix things around the house; Mama bear loved to grow flowers in her garden; and, Baby bear loved playing in the yard. They were very happy. </ActivityNarrativeText>
</ActivityNarrativeInformation>
<ActivityNarrativeInformation>
  <ActivityID>55555555</ActivityID>
  <ActivityNarrativeInformationID>77777777</ActivityNarrativeInformationID>
  <ActivityNarrativeSequenceNumber>31920</ActivityNarrativeSequenceNumber>
  <ActivityNarrativeText>Once upon a time there were three bears, Papa Bear, Mamma Bear and Baby Bear
</ActivityNarrativeText>
</ActivityNarrativeInformation>
<ActivityNarrativeInformation>
  <ActivityID>55555555</ActivityID>
  <ActivityNarrativeInformationID>77777777</ActivityNarrativeInformationID>
  <ActivityNarrativeSequenceNumber>31922</ActivityNarrativeSequenceNumber>
  <ActivityNarrativeText>And Baby Bear, well, he was small, and
sometimes he squeaked! They lived in a pretty little house on the edge of the forest
</ActivityNarrativeText>
</ActivityNarrativeInformation>
</doc

I need to group ActivityNarrativeInformation elements by ActivityID and concatenate ActivityNarrativeText in such a way, that it is sorted by ActivityNarrativeSequenceNumber

I managed to sort elements with following XPath query (XPath 3.1) sort(//ActivityNarrativeInformation[ActivityID=123456789], (), function($ActivityNarrativeSequenceNumber) {$ActivityNarrativeSequenceNumber})

So the result looks like this:

<ActivityNarrativeInformation>
  <ActivityID>123456789</ActivityID>
  <ActivityNarrativeInformationID>111111111</ActivityNarrativeInformationID>
  <ActivityNarrativeSequenceNumber>1</ActivityNarrativeSequenceNumber>
  <ActivityNarrativeText>She Sells Sea Shells by the Sea Shore and she also</ActivityNarrativeText>
  </ActivityNarrativeInformation>
<ActivityNarrativeInformation>
  <ActivityID>123456789</ActivityID>
  <ActivityNarrativeInformationID>111111111</ActivityNarrativeInformationID>
  <ActivityNarrativeSequenceNumber>2</ActivityNarrativeSequenceNumber>
  <ActivityNarrativeText>likes to take long walks on the beach while she drinks a</ActivityNarrativeText>
  </ActivityNarrativeInformation>
<ActivityNarrativeInformation>
  <ActivityID>123456789</ActivityID>
  <ActivityNarrativeInformationID>111111111</ActivityNarrativeInformationID>
  <ActivityNarrativeSequenceNumber>3</ActivityNarrativeSequenceNumber>
  <ActivityNarrativeText>triple shot frappuccino, extra hot, with whipped cream in a tall cup </ActivityNarrativeText>
</ActivityNarrativeInformation>

The probelm however is, that if I want to limit down above to just all ActivityNarrativeText by adding /ActivityNarrativeText at the end like this

sort(//ActivityNarrativeInformation[ActivityID=123456789], (), function($ActivityNarrativeSequenceNumber) {$ActivityNarrativeSequenceNumber})/ActivityNarrativeText

or

sort(//ActivityNarrativeInformation[ActivityID=123456789]/ActivityNarrativeText, (), function($seq) {$seq/ActivityNarrativeSequenceNumber})

The order is lost:

<ActivityNarrativeText>She Sells Sea Shells by the Sea Shore and she also</ActivityNarrativeText>
<ActivityNarrativeText>triple shot frappuccino, extra hot, with whipped cream in a tall cup </ActivityNarrativeText>
<ActivityNarrativeText>likes to take long walks on the beach while she drinks a</ActivityNarrativeText>

What am I doing wrong?

Macin
  • 391
  • 2
  • 6
  • 20

4 Answers4

2

You lose the order when you write /ActivityNarrativeText, and it returns the <ActivityNarrativeText> in the same order they have in the input file

/something with nodes does not just mean map it to the child.

It means

  • Map it

  • Reorder all nodes to the input document order

  • Remove duplicates

You could use !ActivityNarrativeText

BeniBela
  • 16,412
  • 4
  • 45
  • 52
1

If what you want to do is extract a coherenet sentece from your sample xml from that particular ActivityID, this expression

string-join(sort(//ActivityNarrativeInformation[ActivityID=123456789]/ActivityNarrativeText/concat(normalize-space()," "), (), function($ActivityNarrativeSequenceNumber) {$ActivityNarrativeSequenceNumber}))

should output

She Sells Sea Shells by the Sea Shore and she also likes to take long walks on the beach while she drinks a triple shot frappuccino, extra hot, with whipped cream in a tall cup 
Jack Fleeting
  • 24,385
  • 6
  • 23
  • 45
  • Testing it here: https://www.videlibri.de/cgi-bin/xidelcgi, but still the results are as described, i.e. the order is incorrect – Macin Jul 03 '21 at 15:54
  • 1
    @Macin Interesting; I tried in Base-X and it worked, but now it seems it only works with that particular `ActivityID`, but not with the other two... Let me check further. – Jack Fleeting Jul 03 '21 at 16:20
1

Testing it here: videlibri.de/cgi-bin/xidelcgi

If you're using , then please add its tag. And maybe for Windows, or for Unix as well.

I'm not too sure this can be done with XPath. I believe you're better off using XQuery.

For the narrative with <ActivityID>123456789</ActivityID> you could do:

$ xidel -s input.xml --xquery '
  normalize-space(
    for $x in //ActivityNarrativeInformation
    where $x/ActivityID = 123456789
    order by $x/ActivityNarrativeSequenceNumber
    return
    $x/ActivityNarrativeText
  )
'

For all narratives I'd suggest:

$ xidel -s input.xml --xquery '
  for $narrative at $i in //ActivityNarrativeInformation
  group by $id:=$narrative/ActivityID
  count $i
  return (
    $i,
    normalize-space(
      for $seq in $narrative
      order by $seq/ActivityNarrativeSequenceNumber
      return
      $seq/ActivityNarrativeText
    )
  )
'
1
Once upon a time there were three bears, [...]
2
She Sells Sea Shells by the Sea Shore and [...]
3
It was a dark and stormy night; the rain [...]

Group by <ActivityID> first, then in another for-loop order the sentences by <ActivityNarrativeSequenceNumber>.

Update 2021-07-05; I forgot about XPath's !. In that case one for-loop is enough:

$ xidel -s input.xml --xquery '
  for $narrative at $i in //ActivityNarrativeInformation
  order by $narrative/ActivityNarrativeSequenceNumber
  group by $id:=$narrative/ActivityID
  count $i
  return (
    $i,
    normalize-space($narrative ! ActivityNarrativeText)
  )
'
Reino
  • 3,203
  • 1
  • 13
  • 21
  • How does normalize-space() work on a sequence here? normalize-space() is supposed to accept at most one string. When passing a sequence of strings to xidel's normalize-space() it appears to only act on the first string. Saxon and Zorba both, correctly, give a cardinality error. Is $narrative always going to have exactly one item in this example? – David Denenberg Jul 07 '21 at 12:31
  • 1
    @DavidDenenberg No, `$narrative` is a sequence. If `normalize-space()` with your `xidel` binary only returns the first string, then your binary is too old. `normalize-space()` accept sequences as input. That's at least how I interpret https://www.w3.org/TR/xpath-functions-31/#func-normalize-space. And a recent `xidel` build supports it. – Reino Jul 07 '21 at 23:59
  • Yes, it appears to be an older binary. I won't doubt whether recent versions of xidel support this, but it is an incorrect interpretation of the specification. Clearly shows "xs:string?" as the parameter (zero or one atomic strings). – David Denenberg Jul 08 '21 at 13:58
  • @DavidDenenberg: Xidel also gives an error in [standard XQuery mode](https://videlibri.de/cgi-bin/xidelcgi?&data=&=&extract=normalize-space((%22a%22%2C%22b%22))%0A%0A&=&input-format=auto&printed-node-format=text&output-format=adhoc&compatibility=Standard%20XQuery&dot-notation=off&extract-kind=xquery3.1&no-extended-strings=true&no-json=true&no-json-literals=true&only-json-objects=true&strict-type-checking=true&strict-namespaces=true&case-sensitive=true). But all type checking is disabled by default because then the query evaluation runs like 20% faster – BeniBela Jul 19 '21 at 11:57
1

In addition to the right answer not to use / after sorting but ! instead, one of your attempts would actually work if your sort function argument selected the right element as the sort key:

sort(//ActivityNarrativeInformation[ActivityID=123456789]/ActivityNarrativeText, (), function($text) {$text/../ActivityNarrativeSequenceNumber})
Martin Honnen
  • 160,499
  • 6
  • 90
  • 110