0

What would be the most efficient way to grab the destination ip (>ip) and the "User-Agent:". I would like to grab those two values and dump them into a file with ip first in line followed by user agent. I would like to minimize system resources. This will be running 24x7 with flushing the log periodically.

"    1.1.1.1.58477 > 98.139.239.224.80: Flags [P.], cksum 0x2a6c (correct), seq 1:431, ack 1, win 17520, length 430
E.../y@.~...K...b....m.Px9.Iim/.P.Dp*l..GET /images/40eb913b4b20614fad042dc816d412fe_48.jpeg HTTP/1.1^M
Accept: image/png, image/svg+xml, image/*;q=0.8, */*;q=0.5^M
Referer: http://sports.yahoo.com/news/nascar--scene-at-daytona--was-like-a-war-zone--005423629.html^M
Accept-Language: en-US^M
User-Agent: Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; WOW64; Trident/5.0)^M
Accept-Encoding: gzip, deflate^M
Host: socialprofiles.zenfs.com^M
DNT: 1^M
Connection: Keep-Alive^M"   

Adding URL for well formatted original output. docs.google.com/file/d/0B1umMHxdWKkdNzI3anBaemhuOVE/edit?usp=sharing

T 15.127.111.221:64300 -> 198.252.206.16:80 [AP]
GET /posts/15048809/ivc/29bb?_=1362111021654 HTTP/1.1.
Host: stackoverflow.com.
Connection: keep-alive.
Accept: */*.
X-Requested-With: XMLHttpRequest.
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.22 (KHTML, like Gecko)         Chrome/25.0.1364.97 Safari/537.22.
Referer: http://stackoverflow.com/questions/15048809/tcpdump-header-info-grep-or-awk-    or-sed.
Accept-Encoding: gzip,deflate,sdch.
Accept-Language: en-GB,en-US;q=0.8,en;q=0.6.
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.3.
Cookie: __qca=P0-1081552782-122603326; sgt=id=9
sectech
  • 43
  • 1
  • 8
  • I've attempted to edit your log file and I'm not sure if I've got it right. Please [edit](http://stackoverflow.com/posts/15048809/edit) your question to correct it. Also, adding some sample output greatly improves clarity. – Steve Feb 24 '13 at 12:35
  • Here is the raw log. Its a tcpdump and zipped its 3mb whole its 11mb. https://docs.google.com/file/d/0B1umMHxdWKkdYWtibmgyNkE3Um8/edit?usp=sharing – sectech Feb 24 '13 at 18:01

2 Answers2

2

It will help answer your question exactly, if you include in your problem description an exact sample of your required output. Until then, here is a general idea how to proceed.

EDIT

$ awk '/Flags/{sub(/.80:/, "", $4);printf $4"\t"} /User-Agent/{sub(/^[^:][^:]*:/,"");sub(/\.80/,"", $4); print}'  logTest

output

98.139.239.224   Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; WOW64; Trident/5.0)^M

I'm leaving the field matched as $4 as that is what matches my rendition of your sample data, per your comments, you can change it easily to $3.

Note I've used a tab as the field separator between the IP and the User-Agent.

IHTH.

shellter
  • 36,525
  • 7
  • 83
  • 90
  • Hi, Im getting "awk: line 1: runaway string constant ");print} ..." when I try to run your awk command. The problem Im having is trying to get just the destination ip with the user agent string stripped out of the tcpdump and into a new file. Does that help? – sectech Feb 24 '13 at 20:35
  • 1. Does my sample output look like what you want? Did I select the right data? Will the file always have the IP of interest on the line with Flag, and will it be the 4th item on the line? 2. What OS are your using? If not linux, try using `nawk` instead of `awk`. Good luck. – shellter Feb 24 '13 at 23:09
  • 1. Im getting an error running it, read my response above. 2. using debian squeeze. awk works great in my other uses. – sectech Feb 25 '13 at 01:01
  • 0. I still don't know if I'm helping you solve the right problem, please consider updating your question with your required output. (1). I see now I missed a closing dbl-quote and have edited my answer. As a rule of thumb, you always need to count for pairs of dbl-and-single quote chars. Sorry and good luck. – shellter Feb 25 '13 at 02:57
  • Yes I tried messing around with your awk and even with what you added im getting this output.."Flags Mozilla/5.0 (Windows NT 6.1; WOW64; rv:18.0) Gecko/20100101 Firefox/18.0 Flags Mozilla/5.0 (Windows NT 6.1; WOW64; rv:18.0) Gecko/20100101 Firefox/18.0 Flags Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.17 (KHTML, like Gecko) Chrome/24.0.1312.57 Safari/537.17 Flags Mozilla/4.0 (compatible; NativeHost) " – sectech Feb 25 '13 at 03:05
  • So its not picking up the IP its actually printing the word "FLAGS" :) BUT I am getting user agent screen!! – sectech Feb 25 '13 at 03:06
  • I changed the $4 to a $3 and im getting the IP!! just now its adding that last part..www on it. 68.142.213.143.www: Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; WOW64; Trident/5.0) 68.142.213.143.www: Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; WOW64; Trident/5.0) 24.143.206.9.www: Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; WOW64; Trident/5.0) – sectech Feb 25 '13 at 03:11
  • I don't see 'www' anywhere in the sample data above. Sorry, but I won't have time to download a 16 MB file to sort this out. Sounds like you're making progress. You can always say `sub(/\.www/, "", $0)` just before you print it. Good luck. – shellter Feb 25 '13 at 04:11
  • Thanks for pointing me in the right direction, the www was my fault, i forgot to add the -n flag on the tcpdump as soon as I did that it removed the www but added the darn port .80: So im trying to strip that now :( I tried the sub and switched www with .80: but it actually added a :1 to the .80: so it looks like this.. 198.252.206.16.80:1 no worries i appreciate all your help. I will keep at it, im getting closer every time. Thanks! – sectech Feb 25 '13 at 04:58
  • see my edit. Good luck, but ***please*** consider including required sample output in your questions going forward. ;-) ! – shellter Feb 25 '13 at 14:58
  • Thanks for the edit, I wanted to add one more field /Host/ and I did, and its working, kind of. The output is showing the 3 items I want, but not in the order I want, and im not sure why. – sectech Mar 01 '13 at 03:42
  • awk '/AP/{sub(/:80/, "", $4);printf $4"\t"} /User-Agent/{sub(/^[^:][^:]*:/,"");print};sub(/\.80/,"", $4);/Host/{sub(/^[^:][^:]*:/,""); print}' Thats my modification -- but its showing User-Agent on a separate line below Host, which actually would be fine but its on a separate line. Should I not modify that awk like that? – sectech Mar 01 '13 at 03:44
  • example output: 72.21.91.121 www.gravatar.com. Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.22 (KHTML, like Gecko) Chrome/25.0.1364.97 Safari/537.22. – sectech Mar 01 '13 at 04:09
  • for the umptenth time, please edit your question to include required sample output. I don't want to spend time guessing. Good luck. – shellter Mar 01 '13 at 04:09
  • I don't want to have to guess what format this is in, put that in your question above!!!!!!! – shellter Mar 01 '13 at 04:10
  • sorry, I'm going to bed. Good luck. You have a lot of great answer in your other question that covers essentially the same subject. Good luck. – shellter Mar 01 '13 at 04:16
0

Try to use http://justniffer.sourceforge.net/ It is better tool than tcpdump for analyzing http protocol headers