-1

I try to calculate the start of the HTTP header of a packet, for example

GET /test.html HTTP1.1
Host: example.com
...

With WinPcap in C i try

ih = (ip_header *) (pkt_data + 14);
ip_len = (ih->ver_ihl & 0xf) * 4;

th = (tcp_header *) ((u_char*)ih + ip_len);
tcp_len = (((u_char*)ih)[ip_len + 12] >> 4) * 4;

tcp_payload = (u_char*)ih + ip_len + tcp_len;
url = tcp_payload + 4;
end_url = strchr((char*)url, ' ');
url_length = end_url - url;

req_url = (u_char*)malloc(url_length+1);
strncpy((char*)req_url, (char*)url, url_length);
req_url[url_length] = '\0';
printf("%s", req_url); 

The output is sometimes correct but sometimes it prints some part of a cookie or cept-Encodin what is definitely part of the http headerbut at the wrong position. i think maybe some tcp payloads are different in length because of some extra fields but i have no idea how to check? Thank you

#include <stdio.h>
#include <string.h>
#include "pcap.h"

#include <windows.h> 
#include <winsock.h>

/* 4 bytes IP address */
typedef struct ip_address{
    u_char byte1;
    u_char byte2;
    u_char byte3;
    u_char byte4;
}ip_address;

/* IPv4 header */
typedef struct ip_header{
    u_char  ver_ihl;        // Version (4 bits) + Internet header length (4 bits)
    u_char  tos;            // Type of service 
    u_short tlen;           // Total length 
    u_short identification; // Identification
    u_short flags_fo;       // Flags (3 bits) + Fragment offset (13 bits)
    u_char  ttl;            // Time to live
    u_char  proto;          // Protocol
    u_short crc;            // Header checksum
    ip_address  saddr;      // Source address
    ip_address  daddr;      // Destination address
    u_int   op_pad;         // Option + Padding
}ip_header;

typedef struct tcp_header // structure TCP Header
{
  //Pour processeur de type little-endian
  unsigned short source;  // port source
  unsigned short dest;    // port de destination
  unsigned int   seq;     // Sequence number
  unsigned int   ack_seq; // acknowledge sequence

  unsigned short res1:4,  // Reserved 1 : 4 bits
                 doff:4,  // Data Offset
                 fin:1,   // Flag FINISH
                 syn:1,   // Flag SYNCHRONIZE
                 rst:1,   // Flag RESET
                 psh:1,   // Flag PUSH
                 ack:1,   // Flag ACKNOLEDGE
                 urg:1,   // Flag URGENT
                 res2:2;  // Reserved 2 : 2 bits (res1 + res2 = 6 bits reserved)

    unsigned short window;
    unsigned short check;   // checksum
    unsigned short urg_ptr; // urgent
}tcp_header;

void packet_handler(u_char *param, const struct pcap_pkthdr *header, const u_char *pkt_data);

int main(int argc, char *argv[]) {

pcap_loop(adhandle, 0, packet_handler, NULL);


/* Callback function invoked by libpcap for every incoming packet */
void packet_handler(u_char *param, const struct pcap_pkthdr *header, const u_char *pkt_data) {

    ip_header *ih;
    tcp_header *th;

    u_int ip_len;   
    int tcp_len, url_length, udp_len;
    u_char *url, *cont, *end_url, *final_url, *tcp_payload;

    /* position of the ip header */
    ih = (ip_header *) (pkt_data + 14); //length of ethernet header
    ip_len = (ih->ver_ihl & 0xf) * 4;
    /* position of the tcp header */
    th = (tcp_header *) ((u_char*)ih + ip_len);
    tcp_len = (((u_char*)ih)[ip_len + 12] >> 4) * 4;

    tcp_payload = (u_char*)ih + ip_len + tcp_len;

    url = tcp_payload + 4; // skip "GET " 
    end_url = strchr((char*)url, ' ');
    url_length = end_url - url;

    if(url_length>0) {
        final_url = (u_char*)malloc(url_length+1);
        strncpy(final_url, url, url_length);
        final_url[url_length] = '\0';
        printf("\n%s\n", final_url); 
    } 

}

}
  • I would suggest, that you add code, to dump the packet in question. You could use something like `strncmp(tcp_payload, "GET ",4) != 0`as a trigger. How do you make sure, that you only process the first packet of a TCP connection? Could it be, that you picked of a TCP payload from a request, that requires multiple IP frames? – Mario Klebsch Oct 27 '16 at 18:59
  • It looks like there is no-where near enough code here for anyone to reproduce the problem. Please read [ask] and provide an [mcve] – Tibrogargan Oct 27 '16 at 19:04
  • @MarioKlebsch yes i think i have to reassamble the packets later, thats why i want to know hot to check the packets for situations like that or sometimes applications also use port 80 and put theyr "own layer" in http requests. i dont know how to check that. i want to learn how to decode the packet to extract alldata. the request url is just the excitiest part right now. – user7082002 Oct 27 '16 at 19:13
  • If you're dealing with TCP packets, how do you know the entire header is in one packet? And while you haven't posted enough code for me to be certain, it seems you're assuming an HTML header is a fixed-length entity - it's not. – Andrew Henle Oct 27 '16 at 19:36
  • sorry i dont know whats enough code, i thought thats the interesting part, all other parts are from the winpcap homepage standard example. do you mean a simplified one that you can try? – user7082002 Oct 27 '16 at 19:43
  • Why is there a 14-byte (other type?) offset from the start of `pkt_data` to the start of the IP header? Are you tunneling IP through some other protocol? – John Bollinger Oct 27 '16 at 19:43
  • i thought thats the ethernet header. i have added some more code – user7082002 Oct 27 '16 at 19:49
  • When working on captured ethernet frames, you first have to handle fragmentation on IP level and re-assemble fragmented IP frames. After the IP frames are re-assembled, you have to use TCP sequence numbers to reconstruct the TCP data stream. The starting sequence number must be obtained from the initial three-way-handshake used to setup the TCP connection. You may find inspiration on how to do this by looking into the source code of the wireshark follow TCP stream function. – Mario Klebsch Oct 27 '16 at 19:57
  • thank you, i thought the http header is never fragmented, just the html part. so i have to look for tcp syn/ack as the initial entry point, but how do i know its fragmented or not? i had searched for fragmentation in the http header.. – user7082002 Oct 28 '16 at 12:16

1 Answers1

0

To parse a HTTP request you need to break the request in lines, each line be parsed alone. The first one is the request itself, then the headers and finally the data.

Each line is ended with "\r\n" combination, so you should search for that character and the first line will always be the request line.

For example

char *end_of_line;
end_of_line = strstr(tcp_payload, "\r\n");
if (end_of_line != NULL) {
    // Your request line goes from `tcp_payload' to `end_of_line'.
}

Parsing HTTP requests is far from simple. You should be aware of that.

Since your code is working sometimes, it's plausible that you are doing something that is undefined behavior somewhere in the code that you didn't post.

Iharob Al Asimi
  • 52,653
  • 6
  • 59
  • 97
  • thank you, i parse for the first ' ' character, you mean that isnt correct? sorry for asing again with request line do you mean the GET? – user7082002 Oct 27 '16 at 18:59
  • No, I mean you should break into lines and then parse each line. Line 0-th line is the http command like `GET`, `POST`, `PUT` ... etc. The next line is a header, all lines are headers until `"\r\n\r\n"` is found. This is way more complicated, I am just giving you an idea. For example you could have a `Content-Length` header, so you should take that into account or you could have a `Content-Encoding` header, which makes it even more complicated. – Iharob Al Asimi Oct 27 '16 at 19:01
  • The request line is `GET /what/to/get HTTP/1.1` So you can split it in three elements, the command `GET`, the document `/what/to/get` the HTTP version which is not necessarily `1.1`. – Iharob Al Asimi Oct 27 '16 at 19:03
  • i try to do exactly that, didnt i? (url = tcp_payload + 4) shoud fetch "GET ", (end_url = strchr((char*)url, ' ');) should fetch the end of the url and the difference is the length (url_length = end_url - url;).. am i wrong? also mostly, like 7:3, i get the url string right. – user7082002 Oct 27 '16 at 19:05