0

(This Question has been edited.)

Dear Community,

I am not very experienced in XQuery and currently facing the following problem while working with CDISC xml documents. I have already spend quite a while on this specific Query and I know I'll be thrilled when I finally get it right.

I am using xquery version "3.0";

I have several records in one document that look like this.

<ItemGroupData ItemGroupOID="DM" data:ItemGroupDataSeq="1">
            <ItemData ItemOID="DM.USUBJID" Value="01-701-1015"/>
            <ItemData ItemOID="DM.RFXSTDTC" Value="2014-01-02"/>
  </ItemGroupData>
<ItemGroupData ItemGroupOID="DM" data:ItemGroupDataSeq="2">
            <ItemData ItemOID="DM.USUBJID" Value="01-701-1023"/>
            <ItemData ItemOID="DM.RFXSTDTC" Value="2014-02-02"/>
  </ItemGroupData>

and another document with several records like this:

    <ItemGroupData ItemGroupOID="EX" data:ItemGroupDataSeq="1" >
            <ItemData ItemOID="EX.USUBJID" Value="01-701-1015"/>
            <ItemData ItemOID="EX.EXSEQ" Value="1"/>
            <ItemData ItemOID="EX.EXSTDTC" Value="2014-06-19"/>
            <ItemData ItemOID="EX.EXENDTC" Value="2014-06-21"/>
    </ItemGroupData>
    <ItemGroupData ItemGroupOID="EX" data:ItemGroupDataSeq="2" >
            <ItemData ItemOID="EX.USUBJID" Value="01-701-1015"/>
            <ItemData ItemOID="EX.EXSEQ" Value="2"/>
            <ItemData ItemOID="EX.EXSTDTC" Value="2014-06-19"/>
            <ItemData ItemOID="EX.EXENDTC" Value="2014-06-27"/>
    </ItemGroupData>
    <ItemGroupData ItemGroupOID="EX" data:ItemGroupDataSeq="3" >
            <ItemData ItemOID="EX.USUBJID" Value="01-701-1015"/>
            <ItemData ItemOID="EX.EXSEQ" Value="3"/>
            <ItemData ItemOID="EX.EXSTDTC" Value="2014-06-19"/>
            <ItemData ItemOID="EX.EXENDTC" Value="2014-07-02"/>
    </ItemGroupData>
   <ItemGroupData ItemGroupOID="EX" data:ItemGroupDataSeq="4" >
            <ItemData ItemOID="EX.USUBJID" Value="01-701-1023"/>
            <ItemData ItemOID="EX.EXSEQ" Value="1"/>
            <ItemData ItemOID="EX.EXSTDTC" Value="2014-06-19"/>
            <ItemData ItemOID="EX.EXENDTC" Value="2014-07-02"/>
    </ItemGroupData>
   <ItemGroupData ItemGroupOID="EX" data:ItemGroupDataSeq="5" >
            <ItemData ItemOID="EX.USUBJID" Value="01-701-1023"/>
            <ItemData ItemOID="EX.EXSEQ" Value="2"/>
            <ItemData ItemOID="EX.EXSTDTC" Value="2014-06-19"/>
            <ItemData ItemOID="EX.EXENDTC" Value=""/>
    </ItemGroupData>

I want to check, if the last EXSTDTC (that with the max(EXSEQ) of the EX dataset equals the corresponding (with the same USUBJID - the unique ID) RFXSTDTC.

And suppose, EXENDTC is null, then RFXSTDTC should equal the last EXSTDTC (but if I only knew how to check only EXENDTC I could probably solve that one on my own)

I still just don’t quite know how to do actually write that and I would be very happy if someone of you could please show me how to do that.

Again, I hope that I was precise enough and that you can help me, thank you in advance!

Responding to the first comment: Thank you very much for your help! You are right, distinct-values are quite time-consuming since my dataset comprises about 300 records (and actually, I am only testing on the first 10 - takes already 6 seconds). To group by seems to be the right approach!

update(12/05/2015)

I tried to solve my problem in the following way but I do not get the value of exendtcvalue in my return expression. To be generic, I use $oids that I retrieved from another metadata xml document.

for $record in doc($dmdatasetlocation)//odm:ItemGroupData[@data:ItemGroupDataSeq < 5][odm:ItemData[@ItemOID=$usubjidoid]]
let $recnum := $record/@data:ItemGroupDataSeq
let $dmusubjidvalue := $record/odm:ItemData[@ItemOID=$usubjidoid]/@Value
let $rfxendtcvalue := $record/odm:ItemData[@ItemOID=$rfxendtcoid]/@Value
let $exendtcvalue := (
    for $a in doc($exdatasetlocation)//odm:ItemGroupData[odm:ItemData[@ItemOID=$exusubjidoid and @Value=$dmusubjidvalue]]
    let $maxseqvalue := functx:max-string(doc($exdatasetlocation)//ItemGroupData[odm:ItemData[@IItemOID=$exusubjidoid and @Value=$dmusubjidvalue]][odm:ItemData[@ItemOID=$exseqoid]]/@Value)
    where $a = doc($exdatasetlocation)//odm:ItemGroupData[odm:ItemData[@ItemOID=$exusubjidoid and @Value=$dmusubjidvalue]][odm:ItemData[@ItemOID=$exseqoid and @Value=$maxseqvalue]][odm:ItemData[@ItemOID=$exendtcoid]]/@Value
    return $a
)   


 where  not($exendtcvalue=$rfxendtcvalue)

 (: ((functx:if-empty($exendtcvalue, 0)) and :)
 (:     or (functx:if-empty($exendtcvalue, 0) and ($rfxendtcvalue=$exstdtcvalue)) :)

return <warning rule="SPC_DM_RFXENDTC_01" rulelastupdate="2015-02-11" recordnumber="{data($recnum)}">RFXENDTC, value={data($rfxendtcvalue)} in dataset {data($dmdatasetname)} is not an allowed value, it should be the same as the last EX.EXENDTC value ={data($exendtcvalue)} </warning>

Thank you for your help! Christiane

1 Answers1

0

You don't specify whether your XQuery processor supports 3.0 or only 1.0 - in 3.0 you can use group by, which would be particularly handy here. However, it's possible to achieve the same result in either version using distinct-values:

let $subject-ids := fn:distinct-values(//ItemGroupData/ItemData[@ItemOID="subject_ID"]/@Value)
for $id in $subject-ids
let $grouped := //ItemGroupData[ItemData[@ItemOID="subject_ID" and @Value=$id]]
let $max := fn:max($grouped/ItemData[@ItemOID="Sequence"]/@Value)
return $grouped[ItemData[@ItemOID="Sequence" and @Value=$max]]

For a large data set, distinct-values does not perform well. Depending on your XQuery processor, there may implementation specific optimizations you could use to improve performance, or in many cases that's simply using group by.

wst
  • 11,681
  • 1
  • 24
  • 39
  • Thank you for your response! I tried to specify the question because I could not come up with a solution so far and I know, distinct-values are (as you said) not performing well on (my) large dataset. – Christiane Prutsch May 09 '15 at 16:05
  • @ChristianePrutsch What XQuery processor are you using? There may be indexes available or that are configurable to facilitate this type of query. – wst May 09 '15 at 21:06
  • I am using Xquey Version 3.0 (eXist). I tried to improve my question. Thank you for your help! – Christiane Prutsch May 11 '15 at 06:59
  • @ChristianePrutsch In that case you can use `group by`, which eXist says is far more optimized than `distinct-values`. – wst May 11 '15 at 14:32
  • thank you! I tried to solve it with group by but I did not work and I am not enough experienced. I tried it differently - but I don't get the value of my exendtcvalue.. can you help me again? – Christiane Prutsch May 12 '15 at 10:19
  • @ChristianePrutsch Your question has changed so much, it might be best to ask another one. Simplify it if possible and be sure to tag it with `exist-db` so eXist experts will see it. – wst May 12 '15 at 14:25