2

I am using JTidy (the java port of the HTML Tidy library) to scrub some existing sites. When I used my configuration of JTidy is seems to be very strict and ends up cutting off the bottom of the page (bad markup).

When i run the same markup through the w3c HTML validator tool only, It cleans up it up but is more intelligent in its rewriting; instead of chopping off tags, it seems to intelligently guess where the missing tag was and updates the structure accordingly.

Does anyone know the HTML-Tidy configuration w3c uses?

My jtidy configuratio is as follows:

    Tidy tidy = new Tidy();
    tidy.setTidyMark(false);
    tidy.setXHTML(true);
    tidy.setXmlOut(false);
    tidy.setNumEntities(true);        
    tidy.setSpaces(2);
    tidy.setWraplen(2000);
    tidy.setUpperCaseTags(false);
    tidy.setUpperCaseAttrs(false);
    tidy.setQuiet(false);
    tidy.setMakeClean(true);
    tidy.setShowWarnings(true);
    tidy.setBreakBeforeBR(true);
    tidy.setHideComments(true);
TylerH
  • 20,799
  • 66
  • 75
  • 101
empire29
  • 3,729
  • 6
  • 45
  • 71

1 Answers1

2

Tidy configuration used by W3C validator is available here

Jérôme Pouiller
  • 9,249
  • 5
  • 39
  • 47