1

I am trying to automatically indent the HTML5 output. The tool which I tried to use was JTidy, but the problem is that it does not support HTML5 elements and for instance it moves all and to header whereas HTML5 use them in the body.

As HTML is not XML, I cannot use the typical Java XML tools for indenting.

Vojtěch
  • 11,312
  • 31
  • 103
  • 173
  • HTML can be XML. You can make your HTML be like XML (close all your tags, etc.), and then the tools should work. – gen_Eric Mar 21 '12 at 16:41
  • 2
    Not necessarily the case: for instance "itemscope" as an empty attribute in HTML5 is not valid in XML. – Vojtěch Mar 21 '12 at 16:47
  • Two other examples. The doctype for HTML5 is not valid XML, and there are some elements that shouldn't get closed anymore, like meta. – Joe Hildebrand Oct 13 '14 at 20:38

1 Answers1

2

Most robust solution

It's not Java but HTML Tidy for HTML5 is maintained by W3C and a command line tool making it very flexible. This is a current fork and actively maintained as shown by the commit times on the GitHub home page for the project.

Java Solution

If you can't get the latest version of HTML Tidy for HTML5 that supports HTML5 to work then XML is still an option.

HTML5 certainly is not at all designed to be XML friendly, but it does at least give lip service in the form of an XML serialization for HTML5, which, in this article, I'll call XHTML5 ...

There is an XML serialization of HTML5 allowing you to use any standard XML formatting tools to format it any way you desire.

  • Well, I am aware of this, but we want to stick to HTML5. – Vojtěch Mar 21 '12 at 16:46
  • I am down voting only because HTML Tidy is an incomplete algorithm and is no longer actively maintained. I believe people recommend it because they are deeply familiar with the name of the application and use it less than they recommend it while being still less aware of alternatives. – austincheney Mar 29 '12 at 02:12
  • @austincheney then you down vote out of ignorance, read for comprehension, this is **HTML Tidy for HTML5** which is a fork and actively maintained, if you look at the commit times on GitHub they show activity in *hours* and *days*, that is **actively** maintained as far as I can tell. –  Mar 29 '12 at 12:27
  • @JarrodRoberson No, the algorithm is still incomplete in its understanding of text nodes and singletons (empty tags) and it continues to irrevocably alter source code regardless of this incompleteness, so therefore it is harmful. Its primary purpose is to correct simple coding errors better identified by a validation service or a schema. Beautification is a distant second priority. This is the primary page for the ill-fated revival: http://w3c.github.com/tidy-html5/ – austincheney Mar 30 '12 at 03:18
  • @austincheney: can you list some of the alternatives, then? I agree with you that tidy-html5 is not doing a very good job. – Joe Hildebrand Oct 13 '14 at 20:37