5

In the rfc2616 which is the HTTP/1.1 standard, a quoted string is defined as follows.

quoted-string  = ( <"> *(qdtext | quoted-pair ) <"> )
quoted-pair    = "\" CHAR
CHAR           = <any US-ASCII character (octets 0 - 127)>
qdtext         = <any TEXT except <">>
TEXT           = <any OCTET except CTLs, but including LWS>

With this definition "" seems to be a TEXT, and therefore <">\<"> (quote, backslash, quote) seems to be a valid quoted string. But this contradicts the proper usage of backslash as escape character and can even lead to not unambiguously being able to determine the end of the quoted string. Where is my error here?

The RFC also states

LWS            = [CRLF] 1*( SP | HT )
All linear
white space, including folding, has the same semantics as SP. A
recipient MAY replace any linear white space with a single SP before
interpreting the field value or forwarding the message downstream.

I have read the interpretation that even LWS inside quoted strings can be replaced by SP. If I take the RFC literally that's what it says. I am puzzled by this, since this means the quoted strings " ", "\n ", "\n\t \t \t", … are all the same. Can those quoted strings really not be semantically distinguished?

Sandra Rossi
  • 11,934
  • 5
  • 22
  • 48
johannes
  • 7,262
  • 5
  • 38
  • 57
  • For information, in case someone looks for the syntax of quoted string (or anything else), **the RFC2616 has been made obsolete by RFC7230-7235 ([source](https://en.wikipedia.org/wiki/Hypertext_Transfer_Protocol#History))**, and the one which contains the corrected `quoted-string` syntax is RFC7230. – Sandra Rossi Aug 15 '21 at 13:17

1 Answers1

5

Re question 1: It's a bug in the RFC.

See HTTPbis WG ticket 31 and HTTPbis, Part 1, Section 3.2.3.

Re question 2: see HTTPbis Part 1, 3.2.1 - so no, you can't distinguish these.

Julian Reschke
  • 40,156
  • 8
  • 95
  • 98
  • This answers the escaping issue, what is with the LWS semantics question? – johannes Oct 25 '11 at 11:34
  • First, thanks for this Q&A, it helped me, and I [wrote a summary](https://evolvis.org/pipermail/evolvis-platfrm-discuss/2014-November/000675.html) at work. @johannes yes they cannot. You can backslash-escape all literal space and tab characters inside a quoted string, so `$' \t'` becomes `$'\\ \\ \\\t'` (or, if you want a more visual representation, `··→` becomes `"\·\·\→"`), and newlines and other control characters are just lost. – mirabilos Nov 29 '14 at 14:34