-1

I am writing a HTTP header parser in YACC. Since HTTP request and response has same structure except for first line, I hope to use the same parser for them. I individually tested request_line and response_line and they work on HTTP request and HTTP response respectively. However when I combine them in the following way, http_header only matches HTTP requests rules and raises syntax error, unexpected t_backslash, expecting t_digit or t_dot or t_token_char or t_sp when given HTTP response HTTP/1.1 200 OK\r\nHost: foo.com\r\nConnection: Keep-alive\r\n\r\n. How can I make start_line match either request_line or response_line?

0 $accept: request $end

1 allowed_char_for_token: t_token_char
2                       | t_digit
3                       | t_dot

4 token: allowed_char_for_token
5      | token allowed_char_for_token

6 allowed_char_for_text: allowed_char_for_token
7                      | t_separators
8                      | t_colon
9                      | t_backslash

10 text: allowed_char_for_text
11     | text ows allowed_char_for_text

12 ows: %empty
13    | t_sp
14    | t_ws

15 t_number: t_digit
16         | t_number t_digit

17 request_line: token t_sp text t_sp text t_crlf

18 response_line: text t_sp t_number t_sp text t_crlf

19 header: token ows t_colon ows text ows t_crlf

20 headers: header
21        | header headers

22 start_line: request_line
23           | response_line

24 http_headers: start_line headers t_crlf

(My apology for the confusing names. What I mean by http_head is the first line plus the rest of the headers. I am not aware of a name for it.)

user274602
  • 59
  • 1
  • 2
  • 7
  • You need to provide us with more of the grammar to help diagnose the problem. Your complaint includes a "t_backslash" but you didn't show us the lexer/grammar rules that produce it. – Ira Baxter Nov 23 '16 at 17:06
  • @IraBaxter Updated original post. The "unexpected backslash" error comes from yacc trying to parse a response_line as a request_line. What I intended is to pattern match on the first line and process it as request_line if it matches request_line rules and as response_line if matches response line rules. However currently it only applies request_lien rules and raises error if not match. – user274602 Nov 23 '16 at 17:24
  • 1
    Why are you feeding it a backslash? You should be feeding it a real carriage return and a real line feed, not backslashes. – user207421 Nov 23 '16 at 17:40
  • @EJP I copied the input string from C code, so I should be feeding it with read carriage return and new line, instead of \r\n. – user274602 Nov 23 '16 at 17:45
  • Clearly not. The lexer recognized it as a backslash. 'Copied the input string from C code' doesn't prove otherwise. – user207421 Nov 23 '16 at 22:09

1 Answers1

0

You are feeding it a backslash instead of a carriage return/line feed. Clearly you copied a C string literal into something else that doesn't implement C string escaping conventions.

I wouldn't use something as precise as yacc for this task. I wouldn't use anything more precise than a hand-written tokenizer. And I would certainly not present individual characters from an end of line sequence to the parser.

user207421
  • 305,947
  • 44
  • 307
  • 483