4

At work, we stumbled upon Bugzilla creating HTML output that led to lines much too long because the browser didn't break the lines. This was happening on Chrome, but not on Firefox 3.5, so we didn't really care. But Firefox 4 behaves just like Chrome, so we had to find another workaround.

An example is:

<html>
  <body>
    <pre>
      Lorem ipsum dolor sit amet, consetetur sadipscing elitr,&#013;sed diam nonumy eirmod tempor invidunt ut labore et&#013;dolore magna aliquyam erat, sed diam voluptua. At vero eos&#013;et accusam et justo duo dolores et ea rebum. Stet clita kasd&#013;gubergren, no sea takimata sanctus est Lorem ipsum dolor sit&#013;amet.&#013;
    </pre>
  </body>
</html>

The server is using only CR as a linebreak which is very uncommon and the usual alternatives (CR+LF, only LF) work correctly, so the right way to fix this is to tell the Bugzilla server to use one of these linebreak methods. Anyway, I'm curious why this doesn't work and ignoring the linebreaks seems to be the "correct" way for browsers.

Also, I found a strange local workaround for Chrome and FF 4 using a Greasemonkey script (modified version of this one):

var els = document.getElementsByTagName("*");
for(var i = 0, l = els.length; i < l; i++) {
  var el = els[i];
  el.innerHTML = el.innerHTML;
}

It seems this would've no effect on the page, but with this script, linebreaks suddenly are showing correctly.

So my questions are:

  1. Is the Chrome/FF 4 way the "correct" way to handle these kinds of linebreaks inside <pre>?
  2. Why is this Greasemonkey script working?
Community
  • 1
  • 1
schnaader
  • 49,103
  • 10
  • 104
  • 136
  • Yes, Chrome/FF is "correct", and in line with the behavior of C and Unix. CR (Carriage Return) should just go back to the same line -- just as in the old typewriter days. Except that HTML doesn't allow type to overlap that way. ... LF (Line Feed) advances the line and, in C / unix / Browsers, it also resets to the start of the line (unlike typewriters). ... ... AFAIR, Mac treated CR as a linefeed for some reason -- making it the exception to the rule. – Brock Adams May 06 '11 at 20:29
  • @Brock: Yes, MAC OS up to Version 9 and some other systems indeed use CR as a linefeed according to http://en.wikipedia.org/wiki/Newline#Representations – schnaader May 06 '11 at 21:43
  • From official 4.01 pre spec: http://www.w3.org/TR/html401/struct/text.html#h-9.3.4 – Christophe Roussy Aug 04 '14 at 10:22

2 Answers2

3

Yes, the HTML RFC defines a line break as: http://www.w3.org/TR/html401/struct/text.html#line-breaks

A line break is defined to be a carriage return (&#x000D;), a line feed (&#x000A;), or a carriage return/line feed pair. All line breaks constitute white space.

However, a bare carriage return is extremely rare. I'm not surprised it doesn't work. But technically, I'd say that FF4 and Chrome are in the wrong.

Not sure why your greasemonkey script is working. My guess is that getting el.innerHTML is converting CR to CR-LF or LF.

Bill Brasky
  • 2,614
  • 2
  • 19
  • 20
  • 3
    That section is for general text. The [preformatted specification](http://www.w3.org/TR/html401/struct/text.html#preformatted) says that whitespace -- which includes CR and LF -- may be left intact. And `pre` elements are expected, and normally do, leave whitespace alone. This means that CR and LF are free to behave as language programmers have come to expect -- as Chrome and FF4 do. – Brock Adams May 06 '11 at 21:28
3

The GM script works because apparently JS converts CR's (\r) to LF (\n), dynamically on writes to the DOM.

See this test at jsFiddle. Notice how the CR (decimal 13), at the end of the 2nd line, gets converted to LF (decimal 10).

Brock Adams
  • 90,639
  • 22
  • 233
  • 295