1

I need to remove the processing instruction from a DOM. I load several files, merge them and save. But the problem is, that the result looks like this:

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<frag>
    Jó foxim és don Quijote húszwattos lámpánál ülve egy pár bűvös cipőt készít.
</frag>
<?xml version="1.0" encoding="iso-8859-2" standalone="no"?>
<frag>
    Jó foxim és don Quijote húszwattos lámpánál ülve egy pár bűvös cipőt készít.
</frag>
<?xml version="1.0" encoding="utf-8" standalone="no"?>
<frag>
    Jó foxim és don Quijote húszwattos lámpánál ülve egy pár bűvös cipőt készít.
</frag>

I haven't found the way the <?xml ...?> process instruction can be either removed from DOM or ignored when saving the resulting DOM. I'm using Java 6 and the default parser.

Theodor Keinstein
  • 1,653
  • 2
  • 26
  • 47

2 Answers2

3

There is no such method for removing the process instruction.

Your merge process is broken. I'll bet you're reading the fragment files and simply concatenating the strings together to create this example.

The right way to do it is to parse each fragment and add the Elements to want into the final DOM, which is then output.

Even if you remove the processing instruction, what you've posted is invalid XML. There's no root tag that I can see, and you must have one and only one.

duffymo
  • 305,152
  • 44
  • 369
  • 561
  • Ok I understand. I have to create an empty DOM and merge all fragments into it. – Theodor Keinstein Nov 22 '11 at 10:09
  • 1
    By the way, simply using regular expressions would be an option if it weren't for the fact that multiple incompatible character sets are being used. In this case, a DOM solution would be costlier but *much* easier. –  Nov 22 '11 at 10:10
  • RegEx is good here, but you do not need even this. Just remove the first line from file. – AlexR Nov 22 '11 at 10:16
1

You can remove processing instructions by using the SAX API - a XMLStreamReader for example. You can create a FilteredReader using the XMLInputFactory and a StreamFilter.

There is a constant XMLStreamConstants.PROCESSING_INSTRUCTION that can help your filter recognize the processing instructions and hold them back.

Similar is definitely possible with StAX too.

Regardless of the technical feasibility, the merge really looks broken as suggested by duffymo.

kostja
  • 60,521
  • 48
  • 179
  • 224
  • I know this is old but its the first post that came back in a search and is not correct. https://stackoverflow.com/questions/277996/jaxb-remove-standalone-yes-from-generated-xml `marshaller.setProperty("com.sun.xml.bind.xmlDeclaration", Boolean.FALSE);` or `marshaller.setProperty(Marshaller.JAXB_FRAGMENT, Boolean.TRUE);` – Andrew Harris Jun 23 '17 at 09:12
  • Different issue IMO - this answer is about removing the declaration once it is already there. Yours is about not putting it there in the first place. Is removing the declaration after the fact the best way for your use case? It is not. Does it work? It does. – kostja Jun 23 '17 at 09:20