What is the best way to receive data from a socket in Perl, when the data length is unknown?

Question

Right now, I read one character at a time in a loop, until I reach the \0 character. Is there a better way to do this?

score 8 · Answer 1 · answered Jul 18 '10 at 10:22

8

Set your line ending to \x{00} (\0), be sure to localise it, and getline on the handle, like so:

{
    local $/ = "\x{00}";
    while (my $line = $sock->getline) {
       print "$line\n"; # do whatever with your data here
   }
}

answered Jul 18 '10 at 10:22

MkV

3,046
22
16

score 3 · Answer 2 · answered Jul 18 '10 at 12:04

You could use FIONREAD with ioctl. The program below connects to the SSH server on localhost and waits on its greeting:

#! /usr/bin/perl

use warnings;
use strict;

use subs 'FIONREAD';
require "sys/ioctl.ph";
use Socket;
socket my $s, PF_INET, SOCK_STREAM, getprotobyname "tcp"
  or die "$0: socket: $!";
connect $s, sockaddr_in 22, inet_aton "localhost"
  or die "$0: connect: $!";

my $rin = "";
vec($rin, fileno($s), 1) = 1;
my $nfound = select my$rout=$rin, "", "", undef;
die "$0: select: $!" if $nfound < 0;

if ($nfound) {
  my $size = pack "L", 0;
  ioctl $s, FIONREAD, $size
    or die "$0: ioctl: $!";

  print unpack("L", $size), "\n";
  sysread $s, my $buf, unpack "L", $size
    or die "$0: sysread: $!";

  my $length = length $buf;
  $buf =~ s/\r/\\r/g;
  $buf =~ s/\n/\\n/g;
  print "got: [$buf], length=$length\n";
}

Sample run:

$ ./howmuch
39
got: [SSH-2.0-OpenSSH_5.3p1 Debian-3ubuntu4\r\n], length=39

But you'll probably prefer using the IO::Socket::INET and IO::Select modules as in the code below that talks to Google:

#! /usr/bin/perl

use warnings;
use strict;

use subs "FIONREAD";
require "sys/ioctl.ph";
use IO::Select;
use IO::Socket::INET;

my $s = IO::Socket::INET->new(PeerAddr => "google.com:80")
  or die "$0: can't connect: $@";

my $CRLF = "\015\012";
print $s "HEAD / HTTP/1.0$CRLF$CRLF" or warn "$0: print: $!";

my @ready = IO::Select->new($s)->can_read;
die "$0: umm..." unless $s == $ready[0];

my $size = pack "L", 0;
ioctl $s, FIONREAD, $size
  or die "$0: ioctl: $!";

print unpack("L", $size), "\n";
sysread $s, my $buf, unpack "L", $size
  or die "$0: sysread: $!";

my $length = length $buf;
$buf =~ s/\r/\\r/g;
$buf =~ s/\n/\\n/g;
print "got: [$buf], length=$length\n";

Output:

573
got: [HTTP/1.0 200 OK\r\nDate: Sun, 18 Jul 2010 12:03:48 GMT\r\nExpires: -1\r\nCache-Control: private, max-age=0\r\nContent-Type: text/html; charset=ISO-8859-1\r\nSet-Cookie: PREF=ID=6742ab80dd810a95:TM=1279454628:LM=1279454628:S=ewNg64020FbnGzHR; expires=Tue, 17-Jul-2012 12:03:48 GMT; path=/; domain=.google.com\r\nSet-Cookie: NID=36=kn2wtTD4UJ3MYYQ5uvA4iAsrS2wcrb_W781pZ1hrVUhUDHrIJTMg_kOgVKhjQnO5SM6MdC_jrRdxFRyXwyyv5N3Xja1ydhVLWWaYqpMHQOmGVi2K5qRWAKwDhCVRd8WS; expires=Mon, 17-Jan-2011 12:03:48 GMT; path=/; domain=.google.com; HttpOnly\r\nServer: gws\r\nX-XSS-Protection: 1; mode=block\r\n\r\n], length=573

`sysread` returns as soon as there is any data available, so we can skip `FIONREAD` and just call `sysread` with a large size. — Sam Watkins, Sep 22 '16 at 07:32

score 2 · Answer 3 · answered Jul 18 '10 at 09:16

2

What is the best way to receive data from a socket in Perl, when the data length is unknown?

A sound solution to this is impossible, in any language. If you don't know how long the data length is, then you can't possibly know when you've finished receiving all of it from the socket.

Your only hope is to use some kind of a metric to determine if it's been "long enough" since data started coming in, to make the decision that data flow has stopped. But it won't be perfect.

answered Jul 18 '10 at 09:16

Shaggy Frog

27,575
16
91
128

I know that each message is ended with '\o', does that help? – Gal Goldman Jul 18 '10 at 09:19
2

As long as you know *for sure* that character can't be sent as part of your data stream, then it functions like an End-Of-Data marker, in which case you don't need to know the data length. In which case your solution is valid. – Shaggy Frog Jul 18 '10 at 09:20
@Gal: What is `\o`? Do you mean `\0`? – Svante Jul 18 '10 at 09:23

score 2 · Answer 4 · answered Jul 18 '10 at 09:30

2

The answer depends on the protocol. Since your protocol uses '\0' as a separator, you're doing the right thing. I'm pretty sure Perl handles buffering for you, so reading one character at a time is not inefficient.

Many network oriented protocols precede strings with a length. To read a protocol like this, you read the length (usually one or two bytes, depending on the protocol spec), then read that many bytes into a string.

answered Jul 18 '10 at 09:30

slim

40,215
13
94
127

1

PerlIO certainly does handle buffering, so 1-char reads don't incur a *syscall* overhead, but they still waste time in the Perl op loop (not to mention the number of string concatenations that might be happening, depending on the code). Not to micro-optimize, but the `$/` + `getline` approach is far more efficient and abundantly clear, so it wins :) – hobbs Jul 18 '10 at 10:34

score 0 · Answer 5 · answered Sep 22 '16 at 07:25

You can use sysread to read whatever data is available:

my $data;
my $max_length = 1000000;
sysread $sock, $data, $max_length;

Perl's read function waits for the full number of bytes that you requested, or EOF.
This is similar to libc stdio fread(3).

Perl's sysread function returns as soon as it receives any data.
This is similar to UNIX read(2).
Note that sysread bypasses buffered IO, so don't mix it with the buffered read.

Check perldoc -f read and perldoc -f sysread for more info.

For this specific question, it would be better to follow the top answer, and use getline with a line-ending of \0, but we can use sysread if there is no terminating character.

Here's a little example. It requests a web page, and prints the first chunk of data received.

#!/usr/bin/perl -w
use strict; use warnings;
use IO::Socket;

my $host = $ARGV[0] || 'google.com';
my $port = $ARGV[1] || 80;
my $sock = IO::Socket::INET->new(Proto => 'tcp', PeerAddr => $host, PeerPort => $port)
    or die "connect failed: $!";
$sock->autoflush(1);
# use HTTP/1.1, which keeps the socket open by default
$sock->print("GET / HTTP/1.1\r\nHost: $host\r\n\r\n");
my $reply;
my $max_length = 1000000;
# $sock->read($reply, $max_length);   # read would hang waiting for 1000000 bytes
my $count = $sock->sysread($reply, $max_length);
if (!defined $count) {
    die "read failed: $!";
}
print $reply;

What is the best way to receive data from a socket in Perl, when the data length is unknown?

5 Answers5