0

I'm struggling to get the correct regex to match specifically 'Contact' and 'User-Agent' only if the 'Contact' address matches 10.0.x.x in ~70GB of SIP messages.

The SIP message will always contain a 'Contact' and 'User-Agent' however these could be in any position with 'User-Agent' always after 'Contact'.

"If 'Contact' matches 10.0.x.x then grab the 'User-Agent' too".

CSeq: 756 REGISTER
10.0.54.20;branch=z9hG4bK314690454165BD2A;rport=49419;received=133.55.155.196
Contact: <sip:43498234985@10.0.23.71:5060;transport=udp>;methods="INVITE, ACK, BYE,           CANCEL, OPTIONS, INFO, MESSAGE, SUBSCRIBE, NOTIFY, PRACK, UPDATE, REFER";expires=3600
Accept-Language: en-gb,en;q=0.9
User-Agent: PolycomSoundPointIP-SPIP_331-UA/4.0.2.11307
Max-Forwards: 69

I can match the contact however I just can't pull the User-Agent too.

sed -rn 's/.*(^Contact: .*?10\.0\.[0-9]{0,3}\.[0-9]{0,3}).*/\1/p' XSLog2013.01.31-23.31.29.txt

Outputs: Contact: sip:442023482890@10.0.23.71

I get no output with:

sed -rn 's/.*(^Contact: .*?10\.0\.[0-9]{0,3}\.[0-9]{0,3}).*?(^User-Agent:.*?$).*/\1\2/p' XSLog2013.01.31-23.31.29.txt
Luke B
  • 101
  • 1

1 Answers1

0

What you try to do is a multi-line matching. Multi-line regex in sed is really complex. If you can understand something like

sed -rn '/^Contacts/ h;/^User/ h;/^Contact/^User/ {s/(Con.*User)/xx/g;p;n;h};h' inputfile

and the meaning of these ; and h and n, then you have the answer.

If you don't understand, the easier way is to make a script in your favourite-language-supporting multiline regex. For example, a PHP script would be :

$contact = "";
$fp=fopen ($inputfile, 'r');
while(!feof($fp))
{
 $l = fgets($fp);
 if (preg_match('!^(Contact: .*?10\.0\.[0-9]{0,3}\.[0-9]{0,3})!', $l, $match))
 {
  $contact = $match[1];
 }
 if (preg_match('!^User-agent!', $l) && '' != $contact)
 {
  echo $contact."\n".$l."\n=======\n";
  $contact = "";
 }
}

then run with php -f script.php

You can also google about multiline regex in sed.

DotMG
  • 21
  • 3