1

I'm pretty new to Perl. My Perl program is getting the HTTP request message from browser, I want to detect the last blank line.

I was trying to use $_ =~ /\S/, but which doesn't work:

while (<CONNECTION>) {
  print $_;
  if ($_ =~ /\S/) {print "blank line detected\n"; }
}

the output is

GET / HTTP/1.1
blank line detected
Host: xxx.ca:15000
blank line detected
User-Agent: Mozilla/5.0 (X11; Linux i686; rv:5.0) Gecko/20100101 Firefox/5.0
blank line detected
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
blank line detected
Accept-Language: en-us,en;q=0.7,zh-cn;q=0.3
blank line detected
Accept-Encoding: gzip, deflate
blank line detected
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
blank line detected
Connection: keep-alive
blank line detected
Cookie: __utma=32770362.1159201788.1291912625.1308033463.1309142872.11; __utmz=32770362.1307124126.7.3.utmcsr=google|utmccn=(organic)|utmcmd=organic|utmctr=Manitoba%20Locum%20Tenens%20Program; __utma=70597634.1054437369.1308785920.1308785920.1308785920.1; __utmz=70597634.1308785920.1.1.utmcsr=google|utmccn=(organic)|utmcmd=organic|utmctr=leung%20carson
blank line detected



I was trying to use chomp(), which does not work for me too:

  while (<CONNECTION>) {
    chomp(); 
    print "$_\n";
    if ($_ eq "") {print "blank line detected\n"; }
  }

the output:

GET / HTTP/1.1
Host: xxx.ca:15000
User-Agent: Mozilla/5.0 (X11; Linux i686; rv:5.0) Gecko/20100101 Firefox/5.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-us,en;q=0.7,zh-cn;q=0.3
Accept-Encoding: gzip, deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Connection: keep-alive
Cookie: __utma=32770362.1159201788.1291912625.1308033463.1309142872.11; __utmz=32770362.1307124126.7.3.utmcsr=google|utmccn=(organic)|utmcmd=organic|utmctr=Manitoba%20Locum%20Tenens%20Program; __utma=70597634.1054437369.1308785920.1308785920.1308785920.1; __utmz=70597634.1308785920.1.1.utmcsr=google|utmccn=(organic)|utmcmd=organic|utmctr=leung%20carson

Thanks in advance~

draw
  • 4,696
  • 6
  • 31
  • 37

2 Answers2

3

To detect lines with nothing but whitespace,

while (<CONNECTION>) {
  print $_;
  if ($_ =~ /\S/) {print "blank line detected\n"; }
}

should be

while (<CONNECTION>) {
   print $_;
   if ($_ !~ /\S/) {print "blank line detected\n"; }
}

Or for short,

while (<CONNECTION>) {
   print;
   if (!/\S/) {print "blank line detected\n"; }
}

The reason

while (<CONNECTION>) {
   chomp(); 
   print "$_\n";
   if ($_ eq "") {print "blank line detected\n"; }
 }

might not work is because HTTP header lines end with \r\n. You'd need

while (<CONNECTION>) {
   s/\r?\n//; 
   print "$_\n";
   if ($_ eq "") {print "blank line detected\n"; }
 }
ikegami
  • 367,544
  • 15
  • 269
  • 518
  • `HTTP header lines end with \r\n`, so Linux also add '\r' besides '\n'? – draw Jun 30 '11 at 20:51
  • @draw, Linux doesn't add anything. HTTP clients (e.g. browsers) use \r\n to end lines, as per the protocol. – ikegami Jun 30 '11 at 20:52
  • @ikegami, thanks. How do you know the new line is `\r\n`? I failed to find it on `w3.org`. Actually, I'm curious about the structure of header line. – draw Jun 30 '11 at 21:04
  • @draw => the gory details can be found here: http://www.ietf.org/rfc/rfc2616.txt (just search for CRLF) – Eric Strom Jun 30 '11 at 21:20
  • Using \r\n is potentially fragile. Better to be explicit and use \015\012. See the section on newlines in "perldoc perlport" for details. – Dave Cross Jul 01 '11 at 08:29
  • @davorg, Not at all. No need to kowtow to obsolete Macs – ikegami Jul 01 '11 at 08:58
1

I think you want:

/^\s+$/

for blank line detection.

Using /\S/ will detect NON-blank lines.

Better yet, use something like Net::HTTP or LWP to do the heavy lifting for you. Some of the HTTP encoding issues are subtle.

Seth Robertson
  • 30,608
  • 7
  • 64
  • 57
  • `/^\s*$/` or the less redundant `/^\s*\z/` would be a bit more flexible than `/^\s+$/`. – ikegami Jun 30 '11 at 20:46
  • @ikegami: This is not an arbitrary string, but rather part of a HTTP protocol message. We are not going to be seeing an empty string here. – Seth Robertson Jun 30 '11 at 20:49
  • So you're saying it would hurt to use `/^\s*\z/`? Claiming that `/^\s+\z/` detects blank lines, on the other hand... – ikegami Jun 30 '11 at 20:50
  • @ikegami: It is unlikely to hurt, but is also going to match lines even further away from the standard than mine does. Really you should just ($_ eq "\r\n") and be done with it. – Seth Robertson Jun 30 '11 at 21:25