8

I'm putting some page content (which has been run through Tidy, but doesn't need to be if this is a source of problems) into DOMDocument using DOMDocument::loadHTML.

It's coming up with various errors:

ID x already defined in Entity, line X

Is there any way to make either DOMDocument (or Tidy) ignore or strip out duplicate element IDs, so it will actually create the DOMDocument?

Thanks. :)

Aron Rotteveel
  • 81,193
  • 17
  • 104
  • 128
James Inman
  • 1,030
  • 4
  • 15
  • 31

3 Answers3

13

A quick search on the subject reveals this (incorrect) bug report:

http://bugs.php.net/bug.php?id=46136

The last reply states the following:

You're using HTML 4 rules to load an XHTML document. Either use the load() method to parse as XML or the libxml_use_internal_errors() function to ignore the warnings.

I can't be sure if you are encountering this problem for the same reasons, since you did not include a reference to the HTML page being loaded. In any case, using libxml_use_internal_errors() should at least suppress the error.

ID's in HTML documents are generally unique, so the best solution would still be validating your document, if at all possible.

Aron Rotteveel
  • 81,193
  • 17
  • 104
  • 128
0

By definition, IDs are unique. If they are not, you should use classes instead (nor names, where it applies).
I doubt you can force XML tools to ignore duplicate IDs, that will make them handle an invalid XML document.

PhiLho
  • 40,535
  • 6
  • 96
  • 134
0

Use Exceptions to treat duplicate IDs, and rename the second id. Or maybe, combine elements in sub-elements of same parent with the ID.

IDs are unique in an XML file (in the rootElement of XMLTree)