5

I am trying to clean up a HTML string and create an HTML5 document using Tidy and PHP, however, am creating a HTML3.2 document. As seen, I am getting an Config: missing or malformed argument for option: doctype error. I am operating PHP Version 5.5.35 with Centos 6 and Apache 2.2, and php_info() shows the following:

tidy

Tidy support    enabled
libTidy Release 14 June 2007
Extension Version   2.0 ($Id: e066a98a414c7f79f89f697c19c4336c61bc617b $)

Directive   Local Value Master Value
tidy.clean_output   no value    no value
tidy.default_config no value    no value

How do I create an HTML5 document? Below is my attempt:

<?php
$html = <<<EOD
<p>Hello</p>
<div>
 <p data-customattribute="will be an error">bla</p>
 <p>bla</p>
</div>
<div>
 <p>Hi there!</p>
 <div>
  <p>Opps, a mistake</px>
 </div>
</div>
EOD;
$html="<!DOCTYPE HTML><html><head><title></title></head><body>$html</body></html>";

echo($html."\n\n");

    $config = array(
        'indent' => true,
        'indent-spaces' => 4,
        'doctype' => '<!DOCTYPE HTML>',
    );

$tidy = new tidy;
$tidy->parseString($html, $config, 'utf8');
$tidy->cleanRepair();
print_r($tidy);

OUTPUT

<!DOCTYPE HTML><html><head><title></title></head><body><p>Hello</p>
<div>
 <p data-customattribute="will be an error">bla</p>
 <p>bla</p>
</div>
<div>
 <p>Hi there!</p>
 <div>
  <p>Opps, a mistake</px>
 </div>
</div></body></html>

tidy Object
(
    [errorBuffer] => Config: missing or malformed argument for option: doctype
line 9 column 21 - Warning: discarding unexpected </px>
line 3 column 2 - Warning: <p> proprietary attribute "data-customattribute"
    [value] => <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 3.2//EN">
<html>
    <head>
        <title></title>
    </head>
    <body>
        <p>
            Hello
        </p>
        <div>
            <p data-customattribute="will be an error">
                bla
            </p>
            <p>
                bla
            </p>
        </div>
        <div>
            <p>
                Hi there!
            </p>
            <div>
                <p>
                    Opps, a mistake
                </p>
            </div>
        </div>
    </body>
</html>
)
Dekel
  • 60,707
  • 10
  • 101
  • 129
user1032531
  • 24,767
  • 68
  • 217
  • 387

1 Answers1

1

Old versions of Tidy do not support HTML5 documents

The first release of the tidy the supports HTML 5 was in Sep 2015, where the HTML Tidy Advocacy Community Group released the first version of tidy-html5.

Note that you are using an old version of tidy, so you will not be ableto validate html5 documents.

Current precompiled releases of php are not yet compiled with tidy-html5, so if you will want to use tidy-html5 you will have to compile it yourself.

These instructions were taken from the README file in the tidy-html5 github:

Due to API changes in the PHP source, "buffio.h" needs to be changed to "tidybuffio.h" in the file ext/tidy/tidy.c.

That is - prior to configuring php run this in the php source directory:

   sed -i 's/buffio.h/tidybuffio.h/' ext/tidy/*.c

And then continue with (just an example here, use your own php config options):

   ./configure --with-tidy=/usr/local
   make
   make test
   make install
Dekel
  • 60,707
  • 10
  • 101
  • 129
  • Thanks Dekel, I suspected so much after seeing I had a 7 year old version of tidy. Ah, compile my own version. Joy! – user1032531 Jul 17 '16 at 16:11