2

Im having some trouble parsing http headers.

Here is my problem:

char resp[] = "HTTP/1.1 200 OK\r\n"
             "Content-Type: text/html\r\n"
             "Content-Length: 4\r\n"
             "\r\n"
             "text";

// some stuff
sscanf(resp, "HTTP/%f %d\r\n",&version,&code);
sscanf(resp, "%*[^]Content-Length: %d",&size);
//            ^ tried several things here

I thought using sscanf would be a good idea,since i only want to get a few values (if they exist).
My idea was to skip all the headers i dont want.

My questions are:
1-is sscanf a good idea?
2-if not what what approach would work better

Thank you.

Zentdayn
  • 62
  • 2
  • 9
  • OK scanf -> bad idea might just use strstr to search for the things i want and copy them. Want to keep it simple. I guess people who replied assumed i knew more than what i actually know. – Zentdayn Jul 18 '12 at 02:02
  • I don't see where anyone assumed any knowledge on your part ... that doesn't follow just because you would have to obtain more knowledge in order to follow their suggestions. – Jim Balter Jul 18 '12 at 02:12
  • 1
    @JimBalter You are right, i did not express myself correctly. I meant to say that their suggestions do require that i obtain more knowledge on several subjects but i was looking for a solution that i could come up with. What i wrote sounds differently on my native language, that didn't help me out. – Zentdayn Jul 18 '12 at 02:41
  • Sooner or later you are going to wish for something more powerful than strstr. You would be way ahead of the game if you mastered finding, obtaining, linking with, and using libraries. BTW, that's a very interesting comment about something sounding different in one language than when expressed in another. In any case, your English is quite good. – Jim Balter Jul 18 '12 at 03:00

3 Answers3

3

To first order one should never use the *scanf functions.

Parsing HTTP headers is significantly harder than it appears. I would first see if libcurl has already implemented something you can use, and failing that, go straight to flex and bison.

zwol
  • 135,547
  • 38
  • 252
  • 361
1

The benefit of using libraries is that you don't have to understand how they work.

The problem of using libraries is that you don't have to understand how they work.

Whether your applications will have to respond to certain constraints (security and speed come to mind for a server) you will have to spend more time at the implementation details - and that means understanding the problem so you can find a decent solution.

That's what programming is all about.

Tip: not using libraries might be the best way to approach HTTP headers parsing.

Borg
  • 31
  • 1
  • 1
    Not using libraries is also a great way to continually reinvent the wheel, only to find that your home-grown square wheel doesn't roll as well as the round one provided by the library. Unless your goal is to write a function to parse http headers, use a library. – sfstewman Jul 18 '12 at 08:49
0

First answer: don't do it. There are enough weird HTTP codings and case mappings and other weird things that you are probably going to get it wrong doing it yourself. But if you ignore this fine advice, then...

Second answer: Don't use sscanf. It always ends in tears. Consider putting the string through a regular expression library and capturing what you want, or parsing the string line by line. You could do a strstr for "\r\nContent-Length: " but that is not going to stop at the end of headers and may match something unexpected in the body. You could search for \r\n\r\n first and find out where that ends and then do strstr up until then, but at that point you are double-searching.

Seth Robertson
  • 30,608
  • 7
  • 64
  • 57