30

I noticed that many websites, even Google and some banking sites, have poorly-written HTML with no quotes around the values of attributes, or using characters such as ampersands not escaped correctly in links. In other words, many use markup that would not validate.

I am curious about their reasons. HTML has simple rules and it is just mind-boggling that they don't seem to follow those rules. Or do they use programs that just spit out the code?

Daniel Vassallo
  • 337,827
  • 72
  • 505
  • 443
netrox
  • 5,224
  • 14
  • 46
  • 60
  • 3
    See also http://stackoverflow.com/questions/1967191/why-would-google-use-a-font-tag – Chetan S Jan 08 '10 at 19:04
  • 4
    I'd suggest making this a community wiki, and perhaps revising the question to be less argumentative if you want to avoid the almost-inevitable closure. – David Thomas Jan 08 '10 at 19:05
  • I believe most Google sites are GWT based which is a java framework which auto-generates html+css+javascript. – slebetman Jan 08 '10 at 19:07
  • 2
    Aside from being argumentative and subjective, you're also factually wrong. For example, HTML 4 and 5 do *not* require quotes around attribute values.attributes – Jonathan Feinberg Jan 08 '10 at 19:09
  • no argument intended, just why? Where's the community wiki.. would be interested. Thanks for the other link, Chetan. – netrox Jan 08 '10 at 19:10
  • @ricebowl: Personally, I'd also like to know people's opinion here. Especially when people get told to make sure their site validates first but very prominent sites like Google does not validate (last time I checked it had 48 errors). – slebetman Jan 08 '10 at 19:10
  • I'm interested in the spirit of the question but the question, as asked at this point, "why do major websites have terrible html?" is argumentative and subjective (the argument is that "major websites" necessarily have "terrible html;" and "terrible" is opinion, whereas "invalid" is not). @netrox, 'community wiki' is a tick-box near the text-entry field for the body of your question, if you click on the 'edit' link it should be apparent. My apologies for any offence you may have taken from my earlier comment, I intended suggestion rather than, *ahem*, argument. =] – David Thomas Jan 08 '10 at 19:15
  • Voting to reopen. It's an interesting question with an interesting answer (that I was in the middle of writing when this got closed) – Kenan Banks Jan 08 '10 at 19:16
  • @Triptych, agreed, I've edited the question, and title, a little to try and reduce the argumentative/subjective aspects and voted to re-open. – David Thomas Jan 08 '10 at 19:19
  • The question as posed still begs the question by asserting that "no quotes around attributes" is "poorly written". Valid HTML is valid HTML. – Jonathan Feinberg Jan 08 '10 at 19:23
  • @Jonathan, it's really an uninportant quibble. Yes, you are correct, but try running the Google home page through a validator. – Kenan Banks Jan 08 '10 at 19:25
  • Voting to reopen. I find this a fair question to ask seeing that people are getting flamed on SO and elsewhere when not using valid HTML. – Pekka Jan 08 '10 at 19:30
  • I realize I asked a few questions that really ask for their opinions/comments rather than getting the right answer. Is there a good forum to ask questions of that nature? – netrox Jan 08 '10 at 21:26
  • @netrox, to be honest I'm not sure. I think your question is valid here on Stackoverflow (see, for example: http://blog.stackoverflow.com/2010/01/stack-overflow-where-we-hate-fun/ for Jeff Atwood's summary of what makes a good/valid SO question). The reason I suggested a change to the question, to become community-wiki, is that there isn't necessarily a *correct* answer, merely *popular* answers; the community-wiki seems to be the accepted form for such questions. The questions you asked were perfectly valid though. =) – David Thomas Jan 08 '10 at 23:11
  • thanks ricebowl. Jonathan, W3C said that leaving out quotes for attribute values are NOT recommended and by default, required: "By default, SGML *requires* that all attribute values be delimited using either double quotation marks (ASCII decimal 34) or single quotation marks (ASCII decimal 39). Single quote marks can be included within the attribute value when the value is delimited by double quote marks, and vice versa." http://www.w3.org/TR/html4/intro/sgmltut.html#h-3.2.2 – netrox Jan 09 '10 at 02:55

6 Answers6

112

Most people have gotten the answer basically right — that the rules are different when you serve a page a billion times a day. Bytes begin to matter, and the current level of compression clearly shows that Google is concerned with saving bandwidth.

A few points:

One, people are implying that Google's reasons for saving bandwidth are financial. Unlikely. Even a few terabytes a day saved on the Google search results page is a drop in the bucket compared to the sum of all their properties: Youtube, Blogger, Maps, Gmail, etc. Much more likely is that Google wants its search results page, in particular, to load as quickly as possible on as many devices as possible. Yes, bytes matter when the page is loaded a billion times a day, but bytes also matter when your user is using a satellite phone in the Sahara and struggling to get 1kbps.

Two, there is a difference between the codified standards of XHTML and the like, and the de-facto standard of what actually works in every browser ever made since 1994. Here, Google’s scale matters because, where most web developers are happy to ignore any troublesome browser that accounts for less than 0.1% of their users, for Google, that 0.1% is perhaps a half million people. They matter. So their search-results page ought to work on IE 5.5. This is the reason they still use tables for layout on many high-value pages – it’s still the layout that “just works” on the greatest number of browsers.

As an exercise, while an intern at Google, I wrote a perfectly compliant XHTML/CSS version of Google’s search result page and showed it around. Eventually the question came up – why are we serving such hodge-podge HTML? Shouldn’t we be leading the web dev community towards standards? The answer I got was pretty much the second point above. Google DOES follow a standard – not the wouldn’t-it-be-nice standards of web utopia, but the this-has-to-work-absolutely-everywhere standard of reality.

Kenan Banks
  • 207,056
  • 34
  • 155
  • 173
10

Google has a good reason for writing bad HTML – every character they strip from the search page will save them probably gigabytes of bandwidth a day.

Tatu Ulmanen
  • 123,288
  • 34
  • 187
  • 185
  • Bandwidth cost mitigated by peering http://blogs.broughturner.com/2009/04/youtubes-fine-analysts-dont-understand-internet-peering.html agreements makes estimating difficult regarding to what degree bandwidth conservation competes with other factors Google considers. – micahwittman Jan 08 '10 at 19:23
  • It doesn't mean that the programmer at Google code using invalid html. Their clean work is most probably going through a filter before going into the live server that strip any unneeded character (as seen by the source of their homepage). – Pierre-Alain Vigeant Jan 08 '10 at 20:50
  • This myth has been debunked many times by pointing out that Google doesn't even optimize their logo image, which would save them gigabytes of bandwidth *a minute*. – ЯegDwight Jan 29 '10 at 22:06
6

As been discussed previously, google does it for bandwidth reasons.

As for banks and other enterprisey websites, there could be multiple reasons-

  1. CMS spits out invalid HTML
  2. Dreamweaver, enough said.
  3. Tend to use commercial UI components that have been designed to work even on ancient browsers so they err on the careful side.
  4. No emphasis on good HTML and Javascript practices. Many of them tend to be Java or .NET shops with no good UI developers.
  5. Badly designed .NET User controls and JSTL taglibs.
Community
  • 1
  • 1
Chetan S
  • 23,637
  • 2
  • 63
  • 78
  • 2
    You forgot MS Frontpage. Produces far worse code than Dreamweaver ever thought of. Can't tell you how many hours I've wasted cleaning up someone else's code from Frontpage just because of all the garbage that makes it nearly impossible to read. – Tom A Jan 08 '10 at 19:42
  • 3
    Or export a Word document as HTML and upload it as a web page. Ugly as hell. – Pierre-Alain Vigeant Jan 08 '10 at 20:46
4

For several websites such as Google, having perfect code is not "that" important.

The total size of the web-page however, is. A few bytes spared on the HTML code can mean hundreds of dollars in bandwidth.

So if they can be certain their page will be rendered correctly, they won't hesitate to tweak their HTML.

David Thomas
  • 249,100
  • 51
  • 377
  • 410
almathie
  • 731
  • 5
  • 22
2

Generally speaking, coding up a website is easy and therefore the entry barrier is very low for inexperienced or non programmers. This makes it easy to produce sub standard pages and the web is littered with them. Combine that with tools like Microsoft Frontpage that makes it even easier to make a site (and even easier to generate bad HTML code) and you've got a nasty situation.

Noufal Ibrahim
  • 71,383
  • 13
  • 135
  • 169