0

I'm trying to capture the URL from an UDP payload using Libpcap in C with POSIX regex. I have tried all the methods but nothing returns a hit.

I have pasted the part of my code here where im trying to capture the URL that comes with UDP payload.

size_udp = 8;

udp = (struct sniff_udp*)(pktptr + ETHER_HDRLEN + size_udp);
payload_udp = (u_char *)(pktptr + ETHER_HDRLEN + size_ip + size_udp);

size_payload_udp = ntohs(ip->ip_len) - (size_ip + size_udp);

int reg,sh;
regex_t re;
regmatch_t pm;
char *hit;

reg = regcomp(&re, ( "\.youtube\.com", "\.googlevideo\.com","ytimg"), REG_EXTENDED); 
sh = regexec(&re, &payload_udp, 2, &pm, REG_EXTENDED);

strcpy(hit, payload_udp + (pm.rm_so - pm.rm_eo));

if(
   (strstr(hit,"youtube")     != NULL) 
|| (strstr(hit,"googlevideo") != NULL) 
|| (strstr(hit,"video")       != NULL) 
|| (strstr(hit,"ytimg")       != NULL)
)
{
    //Writing to dump file
    pcap_dump(usr, pkthdr, pktptr - lnkhdrlen);
}

This is my code. I would like to know why the regex doens't match the URL of Youtube in the UDP Payload.

Thank You for your suggestion

bitcell
  • 921
  • 8
  • 16
Nishaero
  • 7
  • 4

1 Answers1

0

One possible reason is this line:

reg = regcomp(&re, ( "\.youtube\.com", "\.googlevideo\.com","ytimg"), REG_EXTENDED); 

In your second argument the expressions concerning youtube and googlevideo are unsed. That is, what is actually compiled is this:

reg = regcomp(&re, "ytimg", REG_EXTENDED); 

Your compiler should have warned about this...

Moreover, in

sh = regexec(&re, &payload_udp, 2, &pm, REG_EXTENDED);

some of the arguments do not make sense. pm is only one match structure, yet you tell regexec that it can save 2. &payload_udp is the address of the pointer your payload, not a pointer in the string your are searching for. REG_EXTENDED is not needed for executing only for compiling the regex. sh (the return value) already tells you whether there was a match (if it returns 0) or not (if it returns REG_NOMATCH). No need to copy and strstr. Btw, your strcpy will copy (without limit) to wherever arbitrary memory location hit happens to point, and it will copy as long as it does not find a '0'-byte.

Finally, if your udp payload is not a null-terminated string (or at least starts with the null-terminated string you want to match against) the approach with regexec will not help.

subsub
  • 1,857
  • 10
  • 21
  • Hi.. thank you for your reply. I have tried it even without the escape character '\' just like 'youtube','googlevideo', but even that didn't seem to work. Could you please let me know If my logic is right ? For finding the udp payload and the string match using strstr function. – Nishaero Nov 12 '14 at 14:40
  • I extended the answer. The more I think about it the more I wonder whether the program will not crash because of bad memory accesses. As for the logic: You don't need the strstr for checking, just check the return value of regexec. But if your udp payload does not contain a NULL-terminated string at the beginning, your logic resp. overall design is wrong. If the payload starts with an int that is say 15, then you already have some null-byte right there. If your payload is the string you want to search, but it does not end with a null-byte regexec will read more then just the payload. – subsub Nov 12 '14 at 15:10
  • Thank you for your quick reply subsub. I will re work on the code. I understand about the bad memory issue. i would wanna change that, any suggestions on that?. I will re code it and post back if the issue gets resolved or persists. – Nishaero Nov 12 '14 at 15:49
  • Hi subsub, I tried your suggestion but found out that the udp payload doesn't have a null-terminated string. when i check in wireshark, i can see www.youtube.com on the packet data. but i don't know how to capture that. if you have any idea, could you please let me know. ? – Nishaero Nov 12 '14 at 22:37