5

My issue is the following. I have a XHTML 1.1 page that has a form and input fields. One of the input fields contains a value which is an URI. This URI contains key-value pairs with ampersand (&) as argument separator, that will be passed as a GET request by another web application in the browser.

Usually I would use the entity & to create the ampersands to validate the code as XHTML 1.1. My problem here is that the application does not receive the GET request, since (as expected) the browser does not understand how to handle & in the URI.

So my question is really how to write an ampersand without using the HTML entity, so the browser still recognises it as the argument separator and the GET request is passed on properly to the web app.

I tried Hex (%26) encoding the ampersand but the browser still does not "translate" it back to a proper & character.

A related question, but it does not provide the exact answer to the question I am asking:

XHTML and & (Ampersand) encoding

Community
  • 1
  • 1
mr-euro
  • 2,732
  • 5
  • 23
  • 27
  • 2
    "(as expected) the browser does not understand how to handle `&` in the URI" -- that is **not** as expected, you should not see `&` in the address bar unless you have double-encoded it. – Ben James Jan 13 '10 at 23:25
  • Please re-read the question. The ampersands are part of an URI contained inside a input's value field. After the form is submitted the user will be returned to that same location, exactly as it is written. This means that either I leave the ampersands un-encoded but fail validation, or I encode them with the problem that the browser will receive the HTML entity in the address bar and fail passing on the query-string to the next app in the process. – mr-euro Jan 13 '10 at 23:33
  • 1
    The browser should decode the `&` to `&` when converting the HTML to a DOM. It should then encode the `&` as `%26` when constructing the URL or x-url-form-encoded data. If it doesn't work, then I suspect you are handling the data incorrectly on the server. – Quentin Jan 13 '10 at 23:39
  • David I was just replying to your previous comment you deleted: Yes, that is more what I am looking for. I did try urlencode, rawurlencode, htmlentities and htmlspecialchars before, but only on the querystring part. What you are saying is to apply it to the entire URI. I will give it a try, although I think in the past I have experienced that the slashes (//) in http:// must not be encoded in the address bar. – mr-euro Jan 13 '10 at 23:42
  • You shouldn't need to touch urlencode, rawurldecode or htmlentites. All you should need to do is to take the URL you want to redirect to, run htmlspecialchars over it, then set that as the value of the value attribute for the form control (and set it with HTML, not JavaScript). Then, in PHP, $_GET['control_name'] will give you the URL, and you can redirect to it (after you perform tests to make sure that it is on your site so people can't use your domain to mask their spam links). (Ignore the answer that I deleted, it was wrong and would double URL encode the data). – Quentin Jan 14 '10 at 00:06
  • You need to give us a minimal example, because you have not given us enough information to diagnose the problem for you. – Breton Jan 14 '10 at 00:27
  • The problem is that the redirect comes back from a 3rd party. I send the value of the URI that the 3rd party must return back to the user's browser after other processing has happened. Therefore I can not go back and decode the URI afterward. I must send it exactly as the browser should receive it. – mr-euro Jan 14 '10 at 00:47
  • I am considering either changing the arg_separator.input to semi-colon, or perhaps simply use one key-value pair to avoid it all together and instead pass the entire query-string in one single string which can be parsed instead. – mr-euro Jan 14 '10 at 00:49

5 Answers5

1

There is no way to include an ampersand character in an attribute value without using an entity.

There is no way to include an ampersand character as a textNode without using an entity or CDATA markers (but I bet you are serving as text/html so you can't use those).

That said — any browser which fails to decode the entity is broken. No mainstream browser fails there. You are either using an obscure and broken browser, or are misdiagnosing the problem.

Quentin
  • 914,110
  • 126
  • 1,211
  • 1,335
  • Any major browser (IE or FF will do). The browser handles the decoding properly inside the HTML. I am referring to actually using the HTML entity in the address bar. Try it... – mr-euro Jan 13 '10 at 23:24
  • 2
    Well don't do that! You type plain URLs into the address bar, not HTML encoded URLs. That's like opening a Microsoft Word document in Notepad. – Quentin Jan 13 '10 at 23:26
  • The browser is redirected to that URI as it was typed directly into the address bar, including HTML entities. That is my issue. – mr-euro Jan 13 '10 at 23:44
  • 1
    Either the browser is redirected to the URI, or the URI is typed into the address bar. It can't be both. – Quentin Jan 14 '10 at 00:03
  • Obviously the former. What I am saying is that the effect is the same as if the URI was typed directly into the address bar (vs. being e.g. an anchor being clicked on). – mr-euro Jan 14 '10 at 00:43
  • So either your input is wrong (it still isn't clear what the input actually is, but it sounds like it should be an HTML encoded URI with any URI encoding of the ampersands being handled by the browser) or the server side form processor is broken. – Quentin Jan 14 '10 at 08:46
1

As mentioned in the other question you referenced, the browser converts the & to & when the page is processed, so the "&" (not &) should be sent to the server in the GET request. Perhaps you are using Ajax to make the GET request, in which case, you may need to decode the HTML. The entity is required for XHTML--no alternative encoding, just make sure it is properly decoded.

Reference: The & changes to & in a hyperlink

Doug Domeny
  • 4,410
  • 2
  • 33
  • 49
  • The issue is that the browser receives the HTML entity directly into the address bar (as if it was typed directly). I am not referring to the decoding that happens automatically e.g. if you use the ampersand equivalent entity inside an anchor. – mr-euro Jan 13 '10 at 23:37
  • How is the URI being put in the input field? If the value is part of the HTML, then it should be the entity name, if set using JavaScript, then it should not. – Doug Domeny Jan 14 '10 at 13:01
1

The escaped & should be converted by the client (browser) everywhere in the XHTML document.

So you should escape every & with &

KARASZI István
  • 30,900
  • 8
  • 101
  • 128
0

Without the code its difficult to tell where you are trying to keep this information, if you could post the code we could do a better job understanding the problem.

One possible (if this is in fact what you are facing) is to move the items in the querystring into other form elements, such as:

<form action="example.com/?foo=1&bar=2>
    <!-- ... -->
</form>

to:

<form action="example.com">
    <input type="hidden" name="foo" value="1" />
    <input type="hidden" name="bar" value="2" />
    <!-- ... -->
</form>
mynameiscoffey
  • 15,244
  • 5
  • 33
  • 45
  • The querystring is not in the actual form action, but inside an input field's value field. It is a value which gets passed to a web app, which later returns the user's browser to that same URI (with the query-string in it). This is where it fails since the browser can not understand the HTML entity in the address bar. – mr-euro Jan 13 '10 at 23:26
  • Gotcha, my bad. If that is the case can't you just escape it when you stick it in the input field (which is probably best to do anyway to avoid any XSS attacks) and then un-escape it before you do the redirect server-side? – mynameiscoffey Jan 14 '10 at 00:23
  • Unfortunately the redirect comes from a 3rd party. So I need to send the URI exactly as I need it to come back... the 3rd party simply receives it and returns the user's browser to it after it has done other work. – mr-euro Jan 14 '10 at 00:41
0

I could not bother spending more time on this. I simply changed the argument separator to also include semi-colon (;) so I can use it instead of ampersand:

#cat .htaccess
php_value arg_separator.input "&;"
mr-euro
  • 2,732
  • 5
  • 23
  • 27