0

I'm using PHP Tidy as an included script, and while it seems to mostly (if imperfectly) work, it doesn't seem to work to remove the name attributes from my tags. I've tried everything to remove them, including removing them using PHP Simple HTML DOM before running Tidy, but they just keep getting put back in.

I've researched this issue extensively, but the only results I come up with are from people recommending using anchor-as-name, so it must work, and there's just something about what I'm doing that isn't working.

My Tidy config is as follows, perhaps something else is overriding the anchor-as-name element? I moved it to the bottom, just in case that would help, but it didn't seem to. I also tried setting it to false, and that didn't help, either.

$tidy_config = Array(

    'break-before-br'       => 'no',
    'clean'                 => 'clean',
    'doctype'               => 'strict',
    'drop-empty-paras'      => 'yes',
    'drop-font-tags'        => 'yes',
    'force-output'          => 'yes',
    'indent'                => 'yes',
    'indent-attributes'     => 'no',
    'indent-spaces'         => 2,
    'input-encoding'        => 'utf8',
    'join-styles'           => 'no',
    'literal-attributes'    => 'yes',
    'logical-emphasis'      => 'yes',
    'lower-literals'        => 'yes',
    'merge-divs'            => 'no',
    'merge-spans'           => 'yes',
    'output-encoding'       => 'ascii',
    'output-xhtml'          => 'yes',
    'output-bom'            => 'no',
    'preserve-entities'     => 'yes',
    'quiet'                 => 'yes',
    'quote-ampersand'       => 'yes',
    'quote-marks'           => 'no',
    'quote-nbsp'            => 'yes',
    'show-body-only'        => 'yes',
    'show-errors'           => 0,
    'show-warnings'         => 0,
    'sort-attributes'       => 'alpha',
    'tidy-mark'             => 'no',
    'vertical-space'        => 'yes',
    'wrap'                  => '0',
    'wrap-attributes'       => 'no',
    'anchor-as-name'        => 'no'
);

Come to think of it, show-body-only doesn't seem to be working, either... maybe the whole thing is just being ignored and I'm doing something else fundamentally wrong?

Any clues and assistance would be greatly appreciated.

Oezi: Thanks for the tip regarding updating the question. This is the first question I've asked here.

I am using id tags. This is what typically happens (where all relevant variables are defined previously):

require_once $docRoot . '/htmldom/simple_html_dom.php';
require $this_dir . '/includes/create-tidy-object.php';
$string1 = "<a id='anchor1'>First Anchor Text</a>";
$string2 = "<a id='anchor2' name='anchor2'>Second Anchor Text</a>";
$string3 = "<a id='anchor3'>Third Anchor Text</a>";
$tidy->parseString($string1,$tidy_config,'utf8');
$tidy->cleanRepair();
$revised_string_1 = $tidy;
print "<pre>Revised String 1:\n" . htmlentities($revised_string_1) . "\n\n";
$tidy->parseString($string2,$tidy_config,'utf8');
$tidy->cleanRepair();
$revised_string_2 = $tidy;
print "Revised String 2:\n" . htmlentities($revised_string_2) . "\n</pre>\n";
$stringdom3 = str_get_html($string3);
foreach($stringdom3->find('a[id]') as $anchor) { $anchor->name = null; }
$revised_string_3 = $stringdom3;
print "<pre>\nRevised String 3, after PHP Simple HTML DOM Parser:\n";
print htmlentities($revised_string_3) . "\n</pre>\n";
$tidy->parseString($revised_string_3,$tidy_config,'utf8');
$tidy->cleanRepair();
$revised_string_3a = $tidy;
print "<pre>Revised String 3, after going through both:\n";
print htmlentities($revised_string_3a) . "\n\n";

Produces (with line breaks added for legibility):

Revised String 1:
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN">
<html>
<head>
<title></title>
</head>
<body>
<a id='anchor1' name="anchor1">First Anchor Text</a>
</body>
</html>

Revised String 2:
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN">
<html>
<head>
<title></title>
</head>
<body>
<a id='anchor2' name='anchor2'>Second Anchor Text</a>
</body>
</html>

Revised String 3, after PHP Simple HTML DOM Parser:
<a id='anchor3'>Third Anchor Text</a>

Revised String 3, after going through both:
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN">
<html>
<head>
<title></title>
</head>
<body>
<a id='anchor3' name="anchor3">Third Anchor Text</a>
</body>
</html>

So tidy is not only adding name tags, despite anchor-as-name being set to no, it's also producing tags outside the body, despite show-body-only being set to yes.

While the obvious solution would seem be to just not use tidy, since I get what I want for the above lines from just simple html dom, I'm parsing million-character-plus files (500-1000 page documents) written in Word's pathetic version of HTML--on a daily basis--so it really is helpful for its many other features.

Erika S
  • 11
  • 3

1 Answers1

0

from the documentation:

[...] If set to "no", any existing name attribute is removed if an id attribute exists or has been added.

you haven't given information about that, so i assume you just havn't set an id to the anchors where "it doesn't work".

oezi
  • 51,017
  • 10
  • 98
  • 115
  • you shouldn't post additional information in comments anyway - please edit your question and append an example instead. – oezi Mar 21 '12 at 12:51