36

My website is XHTML Transitional compliant except for one thing: the & (ampersand) in the URL are written as it is, instead of &

That is, all the URLs in my pages are usually like this:

<a href="http://www.example.org/page.aspx?x=1&y=2">Foo</a>

But XHTML validator generates this error:

cannot generate system identifier for general entity "y"

... and it wants the URL to be written like this:

<a href="http://www.example.org/page.aspx?x=1&amp;y=2">Foo</a>

The problem is that Internet Explorer and Firefox don't handle the URL correctly and ignore the y parameter. How can I make this link work and validate correctly?

It seems to me that it is impossible to write XHTML pages if the browsers don't work with strict encoded XHTML URLs.

Do you want to see in action? See the difference between these two links (copy and paste them as they are):

http://stackoverflow.com/search?q=ff&sort=newest

and

http://stackoverflow.com/search?q=ff&amp;sort=newest
Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
  • Using & in the URLs of your XHTML doc should work just fine, so the problem is likely elsewhere, as others have already pointed out. If you generate the URL using some server side scripting language, perhaps you can post some of that code, so others can see if the problem could be there? – Daan Nov 08 '08 at 21:43

5 Answers5

57

I have just tried this. What you attempted to do is correct. In HTML if you are writing a link the & characters should be encoded as &amp; You would only encode the & as %26 if you wanted a parameter value to contain an ampersand. I just wrote a simple HTML page that contained a link: <a href="Default2.aspx?param1=63&amp;param2=hel">Click me</a> and it worked fine: default2.aspx received the parameters intended and the source passed validation.

The encoding of & as &amp; is required in HTML, not in the link. When the browser sees the &amp; in the HTML source for a link it will interpret it as an ampersand and the link target will be as intended. If you paste a URL into your browser address bar it does not expect it to be HTML and does not try to interpret any HTML encoding that it may contain. This is why your example links that you suggest we should copy/paste into a browser don't work and why we wouldn't expect them to work.

If you post a bit more of your actual code we might be able to see what you have done wrong, but you appear to be heading the right direction by using &amp; in your anchor tags.

TRiG
  • 10,148
  • 7
  • 57
  • 107
pipTheGeek
  • 2,703
  • 17
  • 16
6

It was my fault: the hyperlink control already encoded &, so my URL http://foo?x=1&amp;y=2 was encoded to http://foo?x=1&amp;amp;y=2

Normally the &amp inside the URL is correctly handled by browsers, as you stated.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
5

You could use &amp; instead of & in your URL within your page.

That should allow it to be validated as strict XHTML...

<a href="http://www.example.org/page.aspx?x=1&amp;y=2">Foo</a>

Note, if used by an ASP.NET Request.QueryString function, the query string doesn't use XML encoding; it uses URL encoding:

/mypath/mypage?b=%26stuff

So you need to provide a function translating '&' into %26.

Note: in that case, Server.URLEncode(”neetu & geetu”), which would produce neetu+%26+geetu, is not what you want, since you need to translate & into %26, not just '&'. You must add a replace() call applied to URLEncode result, in order to replace '%26amp;' by '%26'.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
VonC
  • 1,262,500
  • 529
  • 4,410
  • 5,250
  • Sorry, I've replied you with a new answer instead of a comment... see my answer if you want to follow the thread –  Nov 08 '08 at 21:09
  • Got your comment just now ;) Checking into this issue – VonC Nov 08 '08 at 21:18
0

To be even more thorough: use &#38;, a numeric character reference.

Because &amp; is a character entity reference:

Character entity references are defined in the markup language definition. This means, for example, that for HTML only a specific range of characters (defined by the HTML specification) can be represented as character entity references (and that includes only a small subset of the Unicode range).

That's coming from the wise people at W3C (read this for more).

Of course, this is not a very big deal, but the suggestion of W3C is that the numeric one will be valid and useable everywhere and always, while the named one is 'fine' for HTML but nothing more.

kasimir
  • 1,506
  • 1
  • 20
  • 26
  • 1
    That document also says that [`&` works everywhere](http://www.w3.org/International/questions/qa-escapes#use). – Quentin Sep 25 '14 at 19:13
  • The named ones are mapped to numeric references per the DTD that's applied. Since `&` is supported by all XML and HTML DTD's, that's pretty much always. However, without the DTD or 'plain' SGML parsing the named one will not work and the numeric one will. Also, XML 1.0 only has five pre-defined named references, so you could be mixing names and numbers instead of using just numbers. – kasimir Sep 26 '14 at 07:56
-4

The problem is worse than you think - try it in Safari. &amp;amp; gets converted to &amp;#38; and the hash ends the URL.

The correct answer is to not output XHTML - there's no reason that justifies spending more time on development and alienating Mac users.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Simon
  • 25,468
  • 44
  • 152
  • 266