0

I'm writing a small application in Java that uses XOM to output XHTML.

The problem is that XOM places the following tag before all the html:

<?xml version="1.0" encoding="UTF-8"?>

I've read their documentation, but I can't seem to find how to remove this tag. Thanks guys.

Edit: I'm outputting to a file using XOM's Serializer class

Follow up: If it is good practice to use the XML tag before the DOCTYPE, why don't any websites use it? Also, why does the W3C validator give me and error when it sees the XML tag? Here is the error:

Illegal processing instruction target (found xml)

Finally, if I were to put the XML tag before my DOCTYPE, does this mean I don't have to specify <meta charset="UTF-8" /> in my html header?

Nathan
  • 5,322
  • 5
  • 24
  • 24
  • I guess this is a dumb question, but why is that bad? – MJB Apr 30 '11 at 01:58
  • @MJB It's bad because it doesn't look pretty :-). I like seeing the DOCTYPE as the first line of code when I open up the source – Nathan Apr 30 '11 at 06:07
  • Minor nitpick: that is NOT a tag, it is XML declaration (i.e. it is not even a processing instruction, although looks like one). This is important difference, as rules for declaration are very different from those of tags. – StaxMan Oct 07 '11 at 15:33

3 Answers3

3

Does this work? This is listed in the Javadoc

protected void writeXMLDeclaration() throws IOException

You could override it, and do nothing.....

Agreed you should normally output the prologue

MJB
  • 9,352
  • 6
  • 34
  • 49
  • This looks like it would work nicely. I would do this, but you guys are convincing me to keep the XML tag. If I keep the XML tag, does that mean I don't have to specify in my HTML header? – Nathan Apr 30 '11 at 06:10
  • No. In fact, having the XML declaration means that you do not need to specify encoding elsewhere, as parsers can detect it from it. – StaxMan Oct 07 '11 at 15:35
3

The tag is valid as XML and XHTML, and good practice. There should be no reason to remove it.

Just leave it there ... or fix whatever it is that is expecting it not to be there.


If you don't believe me, take a look at this excerpt from the XHTML 1.1 spec.

"Example of an XHTML 1.1 document

 <?xml version="1.0" encoding="UTF-8"?>
 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
     "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
 <html version="-//W3C//DTD XHTML 1.1//EN"
       xmlns="http://www.w3.org/1999/xhtml" xml:lang="en"
       xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
       xsi:schemaLocation="http://www.w3.org/1999/xhtml
                      http://www.w3.org/MarkUp/SCHEMA/xhtml11.xsd"
 >
   <head>
     <title>Virtual Library</title>
   </head>
   <body>
     <p>Moved to <a href="http://example.org/">example.org</a>.</p>
   </body>
 </html>

Note that in this example, the XML declaration is included. An XML declaration like the one above is not required in all XML documents. XHTML document authors SHOULD use XML declarations in all their documents. XHTML document authors MUST use an XML declaration when the character encoding of the document is other than the default UTF-8 or UTF-16 and no encoding is specified by a higher-level protocol."


By the way, the W3C validation service says that is OK ... but if there is any whitespace before the <?xml ...?> tag it complains.

Stephen C
  • 698,415
  • 94
  • 811
  • 1,216
  • Really? That's interesting... I haven't seen websites use it before, and I always thought that the DOCTYPE had to be the very first line of code. (I believe if there's even just a line break before the DOCTYPE, it throws older IE browsers into quirks mode.) Also, why does the W3C validator give an error when it reads that line? Here is the error: "Illegal processing instruction target (found xml)". Thank you for your answer! – Nathan Apr 30 '11 at 06:06
  • @Nathan - I suspect the validators, etc are complaining about something else. See my updated answer. – Stephen C Apr 30 '11 at 07:52
  • Thank you for following up. You answered my question – Nathan Apr 30 '11 at 21:57
  • Validation service is right: one CAN NOT have ANYTHING before xml declaration, not even white space. Check XML specification for details. – StaxMan Oct 07 '11 at 15:35
3

Assuming you wish to serve your XHTML as text/html content type, you are right to want to remove the XML declaration, because if you don't, it will throw IE6 into quirks mode.

Overriding writeXMLDeclaration() as suggested by MJB looks like a good way to do it.

But you should be aware that you may well hit other problems using an XML serializer and serving the output as text/html.

Most likely, is that the output will produce a tag like this: <script src="myscript.js" />. Browsers (except Safari) won't treat that as a script self closing tag, but as as a script start tag, and everything that follows will be treated as part of the script and not rendered by the browser.

You will probably need to override your serializer to make it HTML aware to resolve this. I suggest overriding the writeEmptyElementTag() function, and for all elements with names not in the list "area", "base", "basefont", "bgsound", "br", "col", "command", "embed", "frame", "hr", "isindex", "image", "img", "input", "keygen", "link", "meta", "param", "source", "spacer" and "wbr", call writeStartTag() and then writeEndTag() instead of the default behaviour.

Finally, if I were to put the XML tag before my DOCTYPE, does this mean I don't have to specify <meta charset="UTF-8" /> in my html header?

No it doesn't. When served as text/html, the XML declaration is simply ignored by browsers, so you will still need to provide the character encoding by some other means, either the meta tag, or in the HTTP headers.

Alohci
  • 78,296
  • 16
  • 112
  • 156