3

Below code is working but taking 15+ hours to execute 6000 employee records, any improvements possible?

I have two employee record structures (employee data and employee benefits) for each of 6000 employees I have merged them into single xml using personnel number (to check the xml structure please check my previous question - https://stackoverflow.com/questions/65174244/multiple-different-xml-structures-to-one-using-xml-using-xsl).

Now I have to append a node/subnode in xml employee record when ID (personIdExternal in multimap:Message1 finds same ID / PERNR in multimap:Message2.

 xml.'**'.findAll{it.name() == 'EmpEmployment'}.each{ p->

 def perID = xml.'**'.find{it.personIdExternal.text() == p.personIdExternal.text()} 
 def pernr = xml.'**'.find{it.PERNR.text() == '000'+perID.personIdExternal.text()}
 if(pernr != null)
 {    
       perID.appendNode {
       erpBenEligibility(pernr.PARDT.text()) }
  }

}  
 message.setBody(groovy.xml.XmlUtil.serialize(xml))

Sample XML:

<?xml version='1.0' encoding='UTF-8'?>
<multimap:Messages xmlns:multimap="http://sap.com/xi/XI/SplitAndMerge">
<multimap:Message1>
<person>
 <person>
    <street>test_stree1</street>
    <city>test_city1</city>
    <state>test_state1</state>
    <EmpEmployment>
     <personIdExternal> 001 </personIdExternal>
    </EmpEmployment>
 </person>
 <person>
    <street>test_stree2</street>
    <city>test_city2</city>
    <state>test_state2</state>
     <EmpEmployment>
     <personIdExternal> 002 </personIdExternal>
    </EmpEmployment>
 </person>
 <person>
    <street>test_stree3</street>
    <city>test_city3</city>
    <state>test_state3</state>
     <EmpEmployment>
     <personIdExternal> 003</personIdExternal>
     </EmpEmployment>
 </person>
</person>
</multimap:Message1>
<multimap:Message2>
<rfc:ZHR_GET_EMP_BENEFIT_DETAILS.Response xmlns:rfc="urn:sap- 
    com:document:sap:rfc:functions"> 
 <phone>
  <home>
   <phone>number1</phone>
  </home> 
  <PERNR> 001 </PERNR>
  <PARDT>#### 1 ####</PARDT>
  <home>
   <phone>number2</phone>
  </home> 
  <PERNR> 002 </PERNR>
  <PARDT>#### 2 ####</PARDT>
  <home>
   <phone>number3</phone>
  </home> 
  <PERNR> 003 </PERNR>
  <PARDT>#### 3 ####</PARDT>
</phone> 
</rfc:ZHR_GET_EMP_BENEFIT_DETAILS.Response xmlns:rfc="urn:sap-com:document:sap:rfc:functions">    
</multimap:Message2>
</multimap:Messages>
  • a lot of modifications could be done. but first - stop using `.'**'.` - better to use exact gpath (full or relative). – daggett Jan 01 '21 at 01:31
  • 1
    and your code does not correspond to your previous almost deleted question. please edit this question and add short xml sample to execute this code. – daggett Jan 01 '21 at 01:40
  • Thank you for your response @daggett . I have added sample XML example in my question, as asked in my previous question I have ' and ' tags in my XML which not allowing me to use exact gpath so I am using '**' to traverse to the node I wanted. Any help is greatly appreciated. Thank you again. – Pradeep Bondla Jan 03 '21 at 01:04
  • i don't see `PARDT` element in xml that is referenced from code. also i don't see any case when `it.PERNR.text() == '000'+perID.personIdExternal.text()` in your xml... – daggett Jan 03 '21 at 16:41
  • If you can stomach the api, you could go for a faster XML library like: https://vtd-xml.sourceforge.io/ – Matias Bjarland Jan 01 '21 at 01:09

1 Answers1

1

some major issues in your code:

  • using .** accessors. if you have 10000 persons in message1, then xml.** will return an array with count(person)+count(EmpEmployment)+count(personIdExternal) = 10000*3 elements. and calling findAll on this array should scan all those elements
  • inside the main loop xml.'**'.findAll{it.name() == 'EmpEmployment'}.each{ you are building nested large arrays for no reason. for example after this expression def perID = xml.'**'.find{it.personIdExternal.text() == p.personIdExternal.text()} you have perID equals to p

your code still does not correspond to the xml sample.

so, i'm going to make some assumptions to show how you could build gpath without .**.:

let we have xml like this:

<?xml version='1.0' encoding='UTF-8'?>
<multimap:Messages xmlns:multimap="http://sap.com/xi/XI/SplitAndMerge">
<multimap:Message1>
  <person>
      <person>
        <EmpEmployment>
          <personIdExternal>001</personIdExternal>
        </EmpEmployment>
      </person>
  </person>
</multimap:Message1>
<multimap:Message2>
  <phone>
      <xyz>
        <PERNR>000001</PERNR>
        <PARDT>#### 1 ####</PARDT>
      </xyz>
  </phone>
</multimap:Message2>
</multimap:Messages>

this is a code part to build large xml message:

def count = 60000 //just for test let's create xml with 60K elements
def msg = '''<?xml version='1.0' encoding='UTF-8'?>
<multimap:Messages xmlns:multimap="http://sap.com/xi/XI/SplitAndMerge">
<multimap:Message1>
  <person>
'''+
(1..count).collect{"""\
      <person>
        <EmpEmployment>
          <personIdExternal>${String.format('%03d',it)}</personIdExternal>
        </EmpEmployment>
      </person>
"""}.join()+
'''  </person>
</multimap:Message1>
<multimap:Message2>
  <phone>
'''+
(1..count).collect{"""\
      <xyz>
        <PERNR>${String.format('%06d',it)}</PERNR>
        <PARDT>#### ${it} ####</PARDT>
      </xyz>
"""}.join()+
'''  </phone>
</multimap:Message2>
</multimap:Messages>
'''

and now the modified transforming algorithm:

def xml = new XmlParser().parseText(msg)

def t = System.currentTimeMillis()
def ns = new groovy.xml.Namespace('http://sap.com/xi/XI/SplitAndMerge')

//for fast search let map PERNR value to a node that contains it
def pernrMap=xml[ns.Message2][0].phone[0].children().collectEntries{ [it.PERNR.text(), it] }

//itearte msg1 -> find entry in pernrMap -> add node
xml[ns.Message1][0].person[0].person.each{p->
    def emp = p.EmpEmployment[0]
    def pernr = pernrMap['000'+emp.personIdExternal.text()]
    if(pernr) emp.appendNode('erpBenEligibility', null, pernr.PARDT.text() )
}
groovy.xml.XmlUtil.serialize(xml)
println "t = ${(System.currentTimeMillis()-t)/1000} sec"

even for 60k elements in msg1 & msg2 it does transformation in less then 1 sec.

daggett
  • 26,404
  • 3
  • 40
  • 56
  • Thank you so much @daggett, in my example to be simple I didn't give exact structure of message2 structure, it has one more name sp ` **** subnodes as in previous exmples **** ` I tried ` def ns2 = new groovy.xml.Namespace('urn:sap-com:document:sap:rfc:functions') def pernrMap = xml[ns.message2.ns2][0].LT_0167[0].children().collectEntries{ [it.PERNR.text(), it] }` But it did not work. please help. – Pradeep Bondla Jan 05 '21 at 16:40
  • Edited my sample xml to add the node in message2 block, please check. tried different ways to access the children after like in my above comment and also def pernrMap = xml[ns.Message2][0].ns2[0].phone[0].children().collectEntries{ [it.PERNR.text(), it] } where ns2 is the namespace in message2 block as in above comments. Please disregard LT_0167[0] in above comment, it is equal to phone[0]. error shows I am accessing null object as just 'xml[ns.Message2][0].phone[0].children().` doesn't work because of the "" namespace node. – Pradeep Bondla Jan 05 '21 at 17:54
  • you have to declare another namespace in your code like this: `def rfcNs = new groovy.xml.Namespace('urn:sap-com:document:sap:rfc:functions')` – daggett Jan 05 '21 at 19:25
  • Thank you again. I did that but how to navigate through the message2 & name space. I did below but it didnt work. `def pernrMap = xml[ns.message2.rfcNs][0].phone[0].children().collectEntries{ [it.PERNR.text(), it] }` also tried `def pernrMap = xml[ns.Message2][0].rfcNs[0].phone[0].children().collectEntries{ [it.PERNR.text(), it] }` what is the right way to access PERNR after declaring rfc namespace? Please help. – Pradeep Bondla Jan 05 '21 at 19:31
  • `xml[ns.Message1][0][rfcNs.get("ZHR_GET_EMP_BENEFIT_DETAILS.Response")][0].phone[0]...` – daggett Jan 05 '21 at 19:34
  • if you want help please correct xml in your question: 1) xmlns can't be in closing tag. 2) `PERNR` and `PARDT` in your question have no relation to each other all of them have the same parent - `phone`, so it's not possible to identify their relation. – daggett Jan 05 '21 at 23:10
  • Thank you so much. Your code works. This decreased the processing time from 7 hours to 4 seconds. Now I understand how name space works and how to navigate through the xml node a bit. Thanks again. Do I need to mark this question as answered? Will I be able to assign points to you? I am new to this forum. – Pradeep Bondla Jan 06 '21 at 01:27
  • 1
    marking as an answered gives 15 points. so you are welcome to do it ) – daggett Jan 06 '21 at 17:29
  • I dont see where I can mark it "answered", I only see share edit follow flag options. Any how than you so much for providing solution. – Pradeep Bondla Jan 06 '21 at 23:40