2

I have a XML file having nested tags. We can use DOM, JDOM parser I want to replace inside the string of all tag from single quote(') to double quote in whole XML file. tag can be nested inside tags also. I want some for loop which looks for all tag and replace value like HYPER SHIPPING'SDN BHD_First_Page --> HYPER SHIPPING''SDN BHD_First_Page

Sample code

    public void iterateChildNodes(org.jdom.Element parentNode) {
        if(parentNode.getChildren().size() == 0) {
            if(parentNode.getText().contains("'")) {
                parentNode.setText(parentNode.getText().replaceAll("'", "\'"));
                LOGGER.info("*************  Below Value updated");
                LOGGER.info(parentNode.getText());
            }
        }else {
            List<Element> rec = parentNode.getChildren();
            for(Element i : rec) {
                iterateChildNodes(i);
            }
        }
    }

Sample XML File

    <Document>
        <Identifier>DOC1</Identifier>
        <Type>HYPER SHIPPING SDN BHD</Type>
        <Description>HYPER SHIPPING SDN BHD</Description>
        <Confidence>33.12</Confidence>
        <ConfidenceThreshold>10.0</ConfidenceThreshold>
        <Valid>true</Valid>
        <Reviewed>true</Reviewed>
        <ReviewedBy>SYSTEM</ReviewedBy>
        <ValidatedBy>SYSTEM</ValidatedBy>
        <ErrorMessage/>
        <Value>HYPER SHIPPING'SDN BHD_First_Page</Value>  //Value to be replaced here
        <DocumentDisplayInfo/>
        <DocumentLevelFields/>
        <Pages>
            <Page>
                <Identifier>PG0</Identifier>
                <OldFileName>HYPER-KL FEB-0001-0001.tif</OldFileName>
                <NewFileName>BI2E7_0.tif</NewFileName>
                <SourceFileID>1</SourceFileID>
                <PageLevelFields>
                    <PageLevelField>
                        <Name>Search_Engine_Classification</Name>
                        <Value>Park Street '10 road</Value>     //Value to be replaced here
                        <Type/>
                        <Confidence>66.23</Confidence>
                        <LearnedFileName>HYPER KL-JUN-0001.tif</LearnedFileName>
                        <OcrConfidenceThreshold>0.0</OcrConfidenceThreshold>
                        <OcrConfidence>0.0</OcrConfidence>
                        <FieldOrderNumber>0</FieldOrderNumber>
                        <ForceReview>false</ForceReview>
                    </PageLevelField>
                </PageLevelFields>
            </Page>
        </Pages>
    </Document>
  • 1
    what have you tried so far? – Stultuske Feb 19 '21 at 14:25
  • 1
    The reason why this: value =value.replace("'","''"); won't help you, is because: a. it's a replace, should be a replaceAll, there could be more than one ' and b: ' ' and " are not the same. you'll need to do something like value.replaceAll("'", "\""); But, if you try to do that, why not just read the entire xml file in a String? one replaceAll might be able to fix your entire issue. – Stultuske Feb 19 '21 at 14:31
  • String#replace() will replace all items. It doesn't use a regex. – NomadMaker Feb 20 '21 at 14:08

3 Answers3

1

This code can replace all ' with " from an XML file.

Adding no description here, try to code step by step. It is very easy to understand.

(Updated)

Part 1: Using JDOM

import java.util.ArrayList;
import java.util.List;

import org.w3c.dom.NodeList;
import org.jdom2.input.SAXBuilder;
import org.jdom2.transform.JDOMSource;
import org.w3c.dom.*;

import java.io.*;

public class XmlModificationJDom {

    public static void main(String[] args) {
        XmlModificationJDom xmlModificationJDom = new XmlModificationJDom();
        xmlModificationJDom.updateXmlAndSaveJDom();

    }

    public void updateXmlAndSaveJDom() {
        try {
            File inputFile = new File("document.xml");
            SAXBuilder saxBuilder = new SAXBuilder();
            org.jdom2.Document xmlDocument = saxBuilder.build(inputFile);
            org.jdom2.Element rootElement = xmlDocument.getRootElement();

            iterateAndUpdateElementsUsingJDom(rootElement);

            saveUpdatedXmlUsingJDomSource(xmlDocument);

        } catch (Exception ex) {
            ex.printStackTrace();
        }

    }

    public void iterateAndUpdateElementsUsingJDom(org.jdom2.Element element) {

        if (element.getChildren().size() == 0) {
            // System.out.println(element.getName() + ","+ element.getText());
            if (element.getText().contains("'")) {
                element.setText(element.getText().replaceAll("\'", "\""));
            }
        } else {
            // System.out.println(element.getName());
            for (org.jdom2.Element childElement : element.getChildren()) {
                iterateAndUpdateElementsUsingJDom(childElement);
            }
        }
    }
}

Part 2: Using DOM

import javax.xml.parsers.*;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.dom.DOMSource;
import javax.xml.transform.stream.StreamResult;

import java.util.ArrayList;
import java.util.List;

import java.io.*;

public class XmlModificationDom {

    public static void main(String[] args) {
        XmlModificationDom XmlModificationDom = new XmlModificationDom();
        XmlModificationDom.updateXmlAndSave();
    }
    
    public void updateXmlAndSave() {
        try {
            File inputFile = new File("document.xml");
            DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
            DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
            Document document = dBuilder.parse(inputFile);
            document.getDocumentElement().normalize();

            Node parentNode = document.getFirstChild();
            iterateChildNodesAndUpate(parentNode);

            writeAndSaveXML(document);

        } catch (Exception ex) {
            ex.printStackTrace();
        }

    }

    public void writeAndSaveXML(Document document) throws Exception {
        TransformerFactory transformerFactory = TransformerFactory.newInstance();
        Transformer transformer = transformerFactory.newTransformer();
        DOMSource source = new DOMSource(document);
        StreamResult result = new StreamResult(new File("updated-document.xml"));
        transformer.transform(source, result);
    }

    public void iterateChildNodesAndUpate(Node parentNode) {

        NodeList nodeList = parentNode.getChildNodes();

        for (int index = 0; index < nodeList.getLength(); index++) {
            Node node = nodeList.item(index);
            if (node.getNodeType() == Node.ELEMENT_NODE) {
                Element element = (Element) node;
                //System.out.print(element.getNodeName());

                if (element.hasChildNodes() && element.getChildNodes().getLength() > 1) {
                    //System.out.println("Child > " + element.getNodeName());
                    iterateChildNodesAndUpate(element);
                } else {
                    //System.out.println(" - " + element.getTextContent());
                    if (element.getTextContent().contains("'")) {
                        String str = element.getTextContent().replaceAll("\'", "\"");
                        element.setTextContent(str);
                    }
                }
            }
        }
    }
}

Input file document.xml:

<Document>
        <Identifier>DOC1</Identifier>
        <Type>HYPER SHIPPING SDN BHD</Type>
        <Description>HYPER SHIPPING SDN BHD</Description>
        <Confidence>33.12</Confidence>
        <ConfidenceThreshold>10.0</ConfidenceThreshold>
        <Valid>true</Valid>
        <Reviewed>true</Reviewed>
        <ReviewedBy>SYSTEM</ReviewedBy>
        <ValidatedBy>SYSTEM</ValidatedBy>
        <ErrorMessage/>
        <Value>HYPER SHIPPING'SDN BHD_First_Page</Value>  //Value to be replaced here
        <DocumentDisplayInfo/>
        <DocumentLevelFields/>
        <Pages>
            <Page>
                <Identifier>PG0</Identifier>
                <OldFileName>HYPER-KL FEB-0001-0001.tif</OldFileName>
                <NewFileName>BI2E7_0.tif</NewFileName>
                <SourceFileID>1</SourceFileID>
                <PageLevelFields>
                    <PageLevelField>
                        <Name>Search_Engine_Classification</Name>
                        <Value>Park Street '10 road</Value>     //Value to be replaced here
                        <Type/>
                        <Confidence>66.23</Confidence>
                        <LearnedFileName>HYPER KL-JUN-0001.tif</LearnedFileName>
                        <OcrConfidenceThreshold>0.0</OcrConfidenceThreshold>
                        <OcrConfidence>0.0</OcrConfidence>
                        <FieldOrderNumber>0</FieldOrderNumber>
                        <ForceReview>false</ForceReview>
                    </PageLevelField>
                </PageLevelFields>
            </Page>
        </Pages>
</Document>

Output updated-document.xml/updated-document-jdom.xml:

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<Document>
        <Identifier>DOC1</Identifier>
        <Type>HYPER SHIPPING SDN BHD</Type>
        <Description>HYPER SHIPPING SDN BHD</Description>
        <Confidence>33.12</Confidence>
        <ConfidenceThreshold>10.0</ConfidenceThreshold>
        <Valid>true</Valid>
        <Reviewed>true</Reviewed>
        <ReviewedBy>SYSTEM</ReviewedBy>
        <ValidatedBy>SYSTEM</ValidatedBy>
        <ErrorMessage/>
        <Value>HYPER SHIPPING"SDN BHD_First_Page</Value><DocumentDisplayInfo/>
        <DocumentLevelFields/>
        <Pages>
            <Page>
                <Identifier>PG0</Identifier>
                <OldFileName>HYPER-KL FEB-0001-0001.tif</OldFileName>
                <NewFileName>BI2E7_0.tif</NewFileName>
                <SourceFileID>1</SourceFileID>
                <PageLevelFields>
                    <PageLevelField>
                        <Name>Search_Engine_Classification</Name>
                        <Value>Park Street "10 road</Value><Type/>
                        <Confidence>66.23</Confidence>
                        <LearnedFileName>HYPER KL-JUN-0001.tif</LearnedFileName>
                        <OcrConfidenceThreshold>0.0</OcrConfidenceThreshold>
                        <OcrConfidence>0.0</OcrConfidence>
                        <FieldOrderNumber>0</FieldOrderNumber>
                        <ForceReview>false</ForceReview>
                    </PageLevelField>
                </PageLevelFields>
            </Page>
        </Pages>
</Document>

More details code, visit this repo

Md Kawser Habib
  • 1,966
  • 2
  • 10
  • 25
  • @Kashyap Savsani, if this answer is helpful and solves your problem, then you can accept and upvote it. Please read this https://stackoverflow.com/help/someone-answers – Md Kawser Habib Feb 20 '21 at 11:36
  • Thanks for your answer it really helps. I understand you are using the DOM parser get the results. I have a specific requirement of using the **JDOM** parser. I tried the code using your logic and edited the sample code in post. But still some logic does not work for some reason. Could you please look and make me help the changes? Like if (node.getNodeType() == Node.ELEMENT_NODE) { Element element = (Element) node; – Kashyap Savsani Feb 20 '21 at 14:08
  • Sorry, for my misunderstanding. I update my answer and add code for JDOM. Hope, you accept this answer. – Md Kawser Habib Feb 20 '21 at 16:32
  • Sorry but I am getting this error if you can help. Attaching updated code in post. I am using JDOM1 and not JDOM2 **error: incompatible types: Object cannot be converted to Element for(Element i: parentNode.getChildren()) {** ^ – Kashyap Savsani Feb 20 '21 at 17:49
  • I managed to handle error. above is the updated code. Thanks a lot – Kashyap Savsani Feb 20 '21 at 18:25
  • Great to hear from you. – Md Kawser Habib Feb 20 '21 at 22:45
0

you need to add backslash on single quote and double quote

value =value.replace("\'","\"");
Eduard A
  • 370
  • 3
  • 9
0

Just replace the removeQuote method with

private static void removeQuote(Document batchXml) throws JDOMException, Exception {
        Element root = batchXml.getRootElement();
        List<Element> docs = root.getChild("Documents").getChildren("Document");
        for (Element doc : docs) {
            String docType = doc.getChildText("Value");
             value =value.replaceAll("\'", "\"");
        }
    }
Sunny
  • 407
  • 5
  • 15