0

Using the following XML can anyone tell me how in Groovy (Gpath or Xpath) I perform a select on the left most element and also include a reference back to the correct parent element?

<CompoundEmployee>
  <person>
    <person_id_external>21554</person_id_external>
    <employment_information>
      <start_date>2014-02-27</start_date>
      <job_information><end_date>2013-04-21</end_date><event>H</event><start_date>2012-09-28</start_date></job_information>
      <job_information><end_date>2013-04-26</end_date><event>5</event><start_date>2013-04-22</start_date></job_information>
      <job_information><end_date>9999-12-31</end_date><event>R</event><start_date>2014-02-27</start_date></job_information>
    </employment_information>
  </person>
  <person>
    <person_id_external>8265</person_id_external>
    <employment_information>
      <start_date>2000-10-02</start_date>
      <job_information><end_date>2014-10-24</end_date><event>5</event><start_date>2014-05-22</start_date></job_information>
      <job_information><end_date>2014-05-21</end_date><event>H</event><start_date>2000-10-02</start_date></job_information>
      <job_information><end_date>9999-12-31</end_date><event>5</event><start_date>2014-10-25</start_date></job_information>
    </employment_information>
  </person>
  <execution_timestamp>2015-05-05T08:17:51.000Z</execution_timestamp>
  <version_id>1502P0</version_id>
</CompoundEmployee>

The select statement written in English is:

"Start Date of Job Information record is less than Employement Information Start Date AND Job Information event type is one of Hire or Rehire"

The elements returned by the query must include person_id_external from employment_information along with start_date from job_information.

So far I have tried.....

def xml = """ xml from above """
def list = new XmlSlurper().parseText(xml)
x = list.'**'.findAll { person ->
    person.event.text() in ['H','R'] && person.start_date.text() < list.person.employment_information.start_date.text()
} 
x.each { l -> println "Type -> ${l.event}, Start Date -> ${l.start_date}, End Date -> ${l.end_date}" }

which works great when there is only one person in the input file but when there are multiple employees the results are incorrect due to the wrong "list.person.employment_information.start_date" being referenced i.e. the parent/child nodes are not related.

Based on the above an example of the output is:

Type -> H, Start Date -> 2012-09-28, End Date -> 2013-04-21

Type -> R, Start Date -> 2014-02-27, End Date -> 9999-12-31

Type -> H, Start Date -> 2000-10-02, End Date -> 2014-05-21

where in fact it should return only 1 row:

Type -> H, Start Date -> 2012-09-28, End Date -> 2013-04-21

As you can see I am nearly there but I just can't work out how to reference and return the logically correct parent employment_information record.

Any ideas anyone?

Thanks, Greg

Stokie
  • 35
  • 4

1 Answers1

0

Querying for '**' and naming the var there person is misleading, when you are actually searching for the employment infos there. Something like this:

def x = list.person.collectEntries{ person ->
    [person.person_id_external.text(), person.employment_information.job_information.findAll{ ji ->
        ji.event.text() in ['H','R'] && ji.start_date.text() < .person.employment_information.start_date.text()}}
cfrick
  • 35,203
  • 6
  • 56
  • 68
  • Thank you @cfrick your code works a treat. One question though - all of the job_information elements are concatenated together for elements that satisfy the findAll condition. If I follow this up with `x.each { println "Key -> ${it.key} Value -> ${it.value}" }`How do I reference the individual elements of job_information that are now referenced by it.value? An example using the above XML is: `Key -> 21554 Value -> 2014-07-01R2014-02-272013-04-21H2012-09-28 Key -> 8265 Value -> ` – Stokie May 07 '15 at 08:48