5

File.html

word<i><span> <span>ratti</span></span></i>

Command

$ tidy File.html

Output

word<i>ratti</i>

Desired output

word<i> ratti</i>

Where's the space?

Log

line 1 column 1 - Warning: missing <!DOCTYPE> declaration
line 1 column 1 - Warning: plain text isn't allowed in <head> elements
line 1 column 8 - Warning: <span> is probably intended as </span>
line 1 column 5 - Warning: replacing unexpected span by </span>
line 1 column 33 - Warning: discarding unexpected </span>
line 1 column 40 - Warning: discarding unexpected </i>
line 1 column 1 - Warning: inserting missing 'title' element
line 1 column 8 - Warning: trimming empty <span>
Info: Document content looks like HTML 4.01 Transitional
8 warnings, 0 errors were found!

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<meta name="generator" content=
"HTML Tidy for Linux/x86 (vers 1st November 2003), see www.w3.org">
<title></title>
</head>
<body>
word<i>ratti</i>
</body>
</html>
Liam
  • 27,717
  • 28
  • 128
  • 190
Chankey Pathak
  • 21,187
  • 12
  • 85
  • 133
  • Seems to be discarded by tidy, did you try unbreakable space   ? – Tarik Zouine Jul 01 '14 at 08:39
  • I can't do anything with the HTML, it's just a sample of 10K+ lines' file, and there are thousands of such files, I can't edit their HTML. There should be an option in `tidy` to preserve spaces in such case. BTW to answer your question, yes it does work fine with   – Chankey Pathak Jul 01 '14 at 08:41
  • You can try this option --add-xml-space yes – Tarik Zouine Jul 01 '14 at 09:11
  • Doesn't help, same result. Moreover that works only for `PRE`, `STYLE` and `SCRIPT` + is used only `when generating XML`. Says the [documentation](http://w3c.github.io/tidy-html5/quickref.html#add-xml-space). – Chankey Pathak Jul 01 '14 at 09:15
  • Also see [Ubuntu Issue 1660537](https://bugs.launchpad.net/bugs/1660537): *HTML Tidy is dooing a poor job; please update to newer HTML Tidy*. – jww Jan 31 '17 at 07:14

1 Answers1

1

This issue seems to have been solved in newer versions. I was using the version from 2003. I just updated tidy on my machine with 2009'th version and with that the output is like below.

content: word<i><span> <span>ratti</span></span></i>
command: tidy file.html
output: word <i><span><span>ratti</span></span></i>

So it is preserving space now, however it does not delete the span tag, anyway this looks like a proper answer to the question.

Chankey Pathak
  • 21,187
  • 12
  • 85
  • 133