0

I have an XML that has a tag value like the following:

<ProjectNote>
    <Note>&lt;!DOCTYPE HTML PUBLIC &quot;-//W3C//DTD HTML 4.0 Transitional//EN&quot;&gt;
        &lt;HTML&gt;&lt;HEAD&gt;&lt;TITLE&gt;&lt;/TITLE&gt;
        &lt;META http-equiv=Content-Type content=&quot;text/html; charset=unicode&quot;&gt;
        &lt;META content=&quot;MSHTML 6.00.3790.4944&quot; name=GENERATOR&gt;&lt;/HEAD&gt;
        &lt;BODY bgColor=#ffffff&gt;
        &lt;P&gt;Key Deliverables&lt;/P&gt;
        &lt;UL&gt;
        &lt;LI&gt;schedule development 
        &lt;LI&gt;scope development (SOW) 
        &lt;LI&gt;business case (depending on project) 
        &lt;LI&gt;contracts (who will be used) 
        &lt;LI&gt;overall budget 
        &lt;LI&gt;Assign Key Stakeholders 
        &lt;LI&gt;Sitewalks and PreCon Meetings 
        &lt;LI&gt;Need Clearance?&lt;/LI&gt;&lt;/UL&gt;
        &lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;
    </Note>
<ProjectNote>

I am reading this file with groovy script and making some changes to it and writing it back to the file. However, the &quot; is getting converted to " while parsing the file with XmlSluper. I don't want to change any other section of the file other than adding a new nodeto it. How can I keep the file as it is?

I am using following code:

package test

import groovy.xml.*
/**
 * A Simple Example that searches information from XML parsed by XmlSlurper.
 */
class Test {
    static srcXMLPath = "C:/SRC_Project/628548_C453_Original.xml"
    static updXMLPath = "C:/SRC_Project/628548_C453_Updated.xml"
    static def writer
    static main(args) {
        File srcFile = new File(srcXMLPath)
        def baseXMLStr = new XmlSlurper(false,false).parse(srcFile)
        def  newXMLStr = new groovy.xml.StreamingMarkupBuilder().bind {
            List_Wrapper {
                mkp.yield baseXMLStr
            }
        }
        writer = new FileWriter(updXMLPath)
        groovy.xml.XmlUtil.serialize( newXMLStr,writer )
        writer.close()

    }
}

However the updated file gets changed to this which is not my intention:

<ProjectNote>
    <Note>&lt;!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"&gt;    
        &lt;HTML&gt;&lt;HEAD&gt;&lt;TITLE&gt;&lt;/TITLE&gt;
        &lt;META http-equiv=Content-Type content="text/html; charset=unicode"&gt;
        &lt;META content="MSHTML 6.00.3790.4944" name=GENERATOR&gt;&lt;/HEAD&gt;
        &lt;BODY bgColor=#ffffff&gt;
        &lt;P&gt;Key Deliverables&lt;/P&gt;
        &lt;UL&gt;
        &lt;LI&gt;As Builts (if needed) 
        &lt;UL&gt;
        &lt;LI&gt;Mapping &amp;amp; Design Drawings&lt;/LI&gt;&lt;/UL&gt;
        &lt;LI&gt;Engineer needs final approval 
        &lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;
    </Note>
<ProjectNote>

Could someone let me know how to avoid it. it is clearly not changing other escape characters

Rao
  • 20,781
  • 11
  • 57
  • 77

1 Answers1

0

You may fix it after building markup:

new File(updXMLPath) << XmlUtil.serialize(newXMLStr).replaceAll('"', '&quot;')

But if you want to add only wrapper, then you don't need to parse source xml, you may past source file to markup as it:

    def  newXMLStr = new StreamingMarkupBuilder().bind {
        List_Wrapper {
            mkp.yieldUnescaped srcFile.text
        }
    }

Finally if you need to put only one List_Wrapper tag, maybe better to do:

new File(updXMLPath) << "<List_Wrapper>${new File(srcXMLPath).text}</List_Wrapper>"
Evgeny Smirnov
  • 2,886
  • 1
  • 12
  • 22
  • But what if the parser parses other escaped characters as well? And if i replace all the '"' with '"' then wouldn't it change it for the attribute values as well? I only wan't to keep my source XML exactly as it was before I made changes to the file (that is adding a new tag) – Sayantani Roy Chaudhuri Jan 04 '18 at 13:33
  • Sorry I thought that you need to operate with source xml nodes too. Updated answer – Evgeny Smirnov Jan 04 '18 at 14:30