3

data presented in HTML format and submitted to server, that does some preprocessing.

It operates with "src" attribute of "img" tag.

After preprocessing and saving, all the preprocessed "img" tags are not self-closed.

For example, if "img" tag was following:

<img src="image.png" />

after preprocessing with Nokogiri or Hpricot, it will be:

<img src="/preprocessed_path/image.png">

The code is pretty simple:

doc = Hpricot(self.content)
doc.search("img").each do |tag|
  preprocess tag
end
self.content = doc.to_html

For Nokorigi, it looks the same.

How to resolve this issue ?


Update 1

Forget to mention - i have HTML 5 page, which i'm trying to validate with W3C Validator.

When "img" tag is inside a div, it complaints about following:

required character (found d) (expected i)
</div>

For example, try to validate following code:

<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
    <meta charset="UTF-8" />
</head>
<body>
    <div>
        <img src="image.png">
    </div>
</body>
</html>

You will get the same error:

Line 9, Column 4: required character (found d) (expected i)
</div>
AntonAL
  • 16,692
  • 21
  • 80
  • 114
  • In the preprocess function you are delegating to, do you not have control over each `img` tag? Can you not return what it is already return and append an explicit `` close tag? – Macy Abbey Nov 18 '10 at 23:43
  • Sure, i can parse everything by hands, using sophisticated regular expressions etc. But, this task must be up to library, i'm using – AntonAL Nov 19 '10 at 00:29

2 Answers2

4

I think the problem is with your <html> tag where it declares the XMLNS attribute as "XHTML". This seems like it would be contradictory to the fact that it's not an XHTML document. If you remove this XMLNS attribute, it should be valid.

<!DOCTYPE html>
<html>
  <head>
  <meta charset="utf-8" />
  <title>something here</title>
</head>
<body>
  <div>
    <img src="image.png">
  </div>
</body>
</html>
the Tin Man
  • 158,662
  • 42
  • 215
  • 303
Peter Brown
  • 50,956
  • 18
  • 113
  • 146
  • Oh, Thanks! I'm completely forgot about this namespace. It was left after copy-paste, that was done many months ago. – AntonAL Nov 19 '10 at 02:52
2

The problem is that your libraries are generating correct HTML, and the trailing "/" is not correct in HTML.

Unless you're serving application/xhtml+xml, there's no point in having it there at all. The IMG tag is self-closing in all versions of HTML, and the "/" is meaningless.

If you are serving application/xhtml+xml, I think you'll need to explicitly use Nokogiri's to_xhtml.

the Tin Man
  • 158,662
  • 42
  • 215
  • 303
Chuck
  • 234,037
  • 30
  • 302
  • 389
  • @AntonAL: So you are serving XHTML instead of HTML. Like I said, in that case, you will need to generate XHTML instead of HTML. Or use HTML. You just need to pick one and stick with it. – Chuck Nov 19 '10 at 02:06