Web server - how to parse requests? Asynchronous Stream Tokenizer?

Question

I'm attempting to create a simple webserver in C# in asynchronous socket programming style. The purpose is very narrow - a Comet server (http long-polling).

I've got the windows service running, accepting connections, dumping request info to the Console and returning simple fixed content to the client.

Now, I can't figure out a manageable strategy for parsing the request data asynchronously and safely. I've written synchronous LL1 parsers before. I'm not sure if LL1 Parser is appropriate or necessary for HTTP. I don't know how to tokenize the input stream asynchronously. All I can think of is having an input buffer per client, reading into that, then copying that to a StringBuilder and periodically checking to see if I have a complete request. But that seems inefficient and might led to difficult to debug/maintain code.

Also, there are the two phases of the connection of receiving the request in full and the sending a response - in this case, after some delay. Once the request is validated and actionable, only then am I planning to enroll the connection in the long-polling manager. However, a misbehaving client could continue to send data and fill up a buffer, so I think I need to continue to monitor and empty the input stream during the response phase, right?

Any guidance on this is appreciated.

I guess the first step is knowing whether it is possible to efficiently tokenize a network stream asynchronously and without a large intermediate buffer. Even without a proper parser, the same challenges of creating a tokenizer apply to reading "lines" of input at a time, or even reading until double blank lines (one big token). I don't want to read one byte at a time from the network, but neither do I want to read too many bytes and have to store them in some intermediate buffer, right?

score 2 · Accepted Answer · answered Feb 27 '11 at 20:55

2

For HTTP the best way is reading the headers in memory completely (until you receive \r\n\r\n) and then simply splitting by \r\n to get the headers and every header by : to separate name and value.

There's no need to use a complex parser for that.

answered Feb 27 '11 at 20:55

ThiefMaster

310,957
84
592
636

1

Thanks. What's a good way to check for that sequence resilient to the case when it straddles buffered reads? When it the request *too* long? Do I need to keep emptying the input stream after that point or can I safely ignore additional input without worrying about a buffer filling up and affecting other connections? – Jason Kleban Feb 27 '11 at 22:36

Web server - how to parse requests? Asynchronous Stream Tokenizer?

1 Answers1