Is returning a whole array from a Perl subroutine inefficient?

Question

I often have a subroutine in Perl that fills an array with some information. Since I'm also used to hacking in C++, I find myself often do it like this in Perl, using references:

my @array;
getInfo(\@array);

sub getInfo {
   my ($arrayRef) = @_;
   push @$arrayRef, "obama";
   # ...
}

instead of the more straightforward version:

my @array = getInfo();

sub getInfo {
   my @array;
   push @array, "obama";
   # ...
   return @array;
}

The reason, of course, is that I don't want the array to be created locally in the subroutine and then copied on return.

Is that right? Or does Perl optimize that away anyway?

Maybe you could explain at a higher level what you're trying to do. There may be a more modern, Perlish way of writing what you want that would avoid your problem altogether. Or at least make your intentions more clear. — Mark Canlas, Feb 13 '09 at 15:20
@unknown(google): I do this often in different contexts. The last time was when I wanted to read a file into an array. So the file has to be opened, I print info on what file I'm opening on STDERR, modify the lines a bit and push them onto the array, then close the file. — Frank, Feb 13 '09 at 15:33
modify how? like could it be done per line, with a map statement? i find pushing something on an array to be highly suspect, again in favor of other techniques. sometimes it can't be helped, though — Mark Canlas, Feb 13 '09 at 16:20

user55400 · Accepted Answer · 2009-02-13T15:23:34.663

18

What about returning an array reference in the first place?

sub getInfo {
  my $array_ref = [];
  push @$array_ref, 'foo';
  # ...
  return $array_ref;
}

my $a_ref = getInfo();
# or if you want the array expanded
my @array = @{getInfo()};

Edit according to dehmann's comment:

It's also possible to use a normal array in the function and return a reference to it.

sub getInfo {
  my @array;
  push @array, 'foo';
  # ...
  return \@array;
}

edited Feb 13 '09 at 15:23

answered Feb 13 '09 at 14:55

user55400

3,929
1
21
13

That sounds like the best solution to me! – Powerlord Feb 13 '09 at 15:02
Actually, how about creating a real array in the function, but having it return a reference to it? Perl would keep the locally created array alive and return a reference efficiently. – Frank Feb 13 '09 at 15:13
@dehmann: good point, I incorporated your comment into my answer, thanks. – user55400 Feb 13 '09 at 15:24
Why does Perl allow you to return references to variables that have local scope? – pacoverflow Jun 16 '13 at 07:56
@pacoverflow: Perl does refcounting, so while there's a reference somewhere, the variable isn't garbage collected. – choroba Feb 12 '18 at 12:27

score 13 · Answer 2 · answered Feb 13 '09 at 15:11

Passing references is more efficient, but the difference is not as big as in C++. The argument values themselves (that means: the values in the array) are always passed by reference anyway (returned values are copied though).

Question is: does it matter? Most of the time, it doesn't. If you're returning 5 elements, don't bother about it. If you're returning/passing 100'000 elements, use references. Only optimize it if it's a bottleneck.

score 8 · Answer 3 · edited May 23 '17 at 12:24

If I look at your example and think about what you want to do I'm used to write it in this manner:

sub getInfo {
  my @array;
  push @array, 'obama';
  # ...
  return \@array;
}

It seems to me as straightforward version when I need return large amount of data. There is not need to allocate array outside sub as you written in your first code snippet because my do it for you. Anyway you should not do premature optimization as Leon Timmermans suggest.

score 4 · Answer 4 · answered Feb 13 '09 at 15:38

To answer the final rumination, no, Perl does not optimize this away. It can't, really, because returning an array and returning a scalar are fundamentally different.

If you're dealing with large amounts of data or if performance is a major concern, then your C habits will serve you well - pass and return references to data structures rather than the structures themselves so that they won't need to be copied. But, as Leon Timmermans pointed out, the vast majority of the time, you're dealing with smaller amounts of data and performance isn't that big a deal, so do it in whatever way seems most readable.

score 2 · Answer 5 · answered Feb 13 '09 at 17:38

2

This is the way I would normally return an array.

sub getInfo {
  my @array;
  push @array, 'foo';
  # ...
  return @array if wantarray;
  return \@array;
}

This way it will work the way you want, in scalar, or list contexts.

my $array = getInfo;
my @array = getInfo;

$array->[0] == $array[0];

# same length
@$array == @array;

I wouldn't try to optimize it unless you know it is a slow part of your code. Even then I would use benchmarks to see which subroutine is actually faster.

answered Feb 13 '09 at 17:38

Brad Gilbert

33,846
11
78
129

Then you can't get the count by assigning getInfo() to a scalar value. http://perlmonks.org/?node_id=729965 has an interesting debate about the use of wantarray. – daotoad Feb 13 '09 at 19:09
I agree, I had used to using `wantarray` about three years ago. I had tough it is cool feature. After many years experience in big perl project with many different skilled developers I have made decision that context aware code is one of most worse thing in Perl. – Hynek -Pichi- Vychodil Feb 14 '09 at 13:30
1

@daotoad: You can never assume that a function returning a list in list context will return its length in scalar context, since that only happens when the function returns an array. If the function returns a list value, you'll receive the last element of the list. Why? Because Perl HATES you. :) – j_random_hacker Feb 14 '09 at 15:08
@j_random_hacker: Nice point ;-) And I hate Perl and using return @{[1,2,3]} when he hates me. – Hynek -Pichi- Vychodil Feb 14 '09 at 15:37
@Hynek: @{[1, 2, 3]} is a good trick, though (as you may have picked up) I hate that it's necessary. :) – j_random_hacker Feb 14 '09 at 16:35
I was showing how to avoid copying an array, and stating that it probably wasn't worth trying to do. – Brad Gilbert Feb 15 '09 at 04:31
I created an interesting puzzle for myself once by doing: "return $start..$end;" In scalar context, .. is the flip flop operator and does not behave at all like I expected. That taught me to always assign to an array before returning a group of results (unless I want non-array like behavior). – daotoad Feb 16 '09 at 04:45
@daotoad: Ouch! Yes, propagation of context to return statements in functions is definitely one of Perl's less intuitive aspects. – j_random_hacker Feb 17 '09 at 10:46

score 2 · Answer 6 · answered Feb 14 '09 at 00:18

There's two considerations. The obvious one is how big is your array going to get? If it's less than a few dozen elements, then size is not a factor (unless you're micro-optimizing for some rapidly called function, but you'd have to do some memory profiling to prove that first).

That's the easy part. The oft overlooked second consideration is the interface. How is the returned array going to be used? This is important because whole array dereferencing is kinda awful in Perl. For example:

for my $info (@{ getInfo($some, $args) }) {
    ...
}

That's ugly. This is much better.

for my $info ( getInfo($some, $args) ) {
    ...
}

It also lends itself to mapping and grepping.

my @info = grep { ... } getInfo($some, $args);

But returning an array ref can be handy if you're going to pick out individual elements:

my $address = getInfo($some, $args)->[2];

That's simpler than:

my $address = (getInfo($some, $args))[2];

Or:

my @info = getInfo($some, $args);
my $address = $info[2];

But at that point, you should question whether @info is truly a list or a hash.

my $address = getInfo($some, $args)->{address};

What you should not do is have getInfo() return an array ref in scalar context and an array in list context. This muddles the traditional use of scalar context as array length which will surprise the user.

Finally, I will plug my own module, Method::Signatures, because it offers a compromise for passing in array references without having to use the array ref syntax.

use Method::Signatures;

method foo(\@args) {
    print "@args";      # @args is not a copy
    push @args, 42;   # this alters the caller array
}

my @nums = (1,2,3);
Class->foo(\@nums);   # prints 1 2 3
print "@nums";        # prints 1 2 3 42

This is done through the magic of Data::Alias.

You can never assume that a function returning a list in list context will return its length in scalar context, since that only happens when the function returns an array. If the function returns a non-array list value, you'll receive the last element of the list instead of its size. — j_random_hacker, Feb 14 '09 at 15:12
Then don't return lists! If your hammer handle gives you splinters, don't wear gloves, sand it smooth! The whole "list vs array" thing in Perl 5 is a giant, gaping bear trap right in the middle of the playground. — Schwern, Feb 16 '09 at 23:37
I totally agree with your last sentence. I would add that most of the kids in the playground, and maybe even the playground designers, don't know about this bear trap. :) — j_random_hacker, Feb 17 '09 at 10:51

score 0 · Answer 7 · answered Mar 25 '16 at 05:57

3 other potentially LARGE performance improvements if you are reading an entire, largish file and slicing it into an array:

Turn off BUFFERING with sysread() instead of read() (manual warns about mixing)
Pre-extend the array by valuing the last element - saves memory allocations
Use Unpack() to swiftly split data like uint16_t graphics channel data

Passing an array ref to the function allows the main program to deal with a simple array while the write-once-and-forget worker function uses the more complicated "$@" and arrow ->[$II] access forms. Being quite C'ish, it is likely to be fast!

score -4 · Answer 8 · answered Feb 13 '09 at 15:00

I know nothing about Perl so this is a language-neutral answer.

It is, in a sense, inefficient to copy an array from a subroutine into the calling program. The inefficiency arises in the extra memory used and the time taken to copy the data from one place to another. On the other hand, for all but the largest arrays, you might not give a damn, and might prefer to copy arrays out for elegance, cussedness or any other reason.

The efficient solution is for the subroutine to pass the calling program the address of the array. As I say, I haven't a clue about Perl's default behaviour in this respect. But some languages provide the programmer the option to choose which approach.

The "address of the array" in Perl is a reference. The question is whether Perl optimizes for it. — Max Lybbert, Feb 14 '09 at 01:47

Is returning a whole array from a Perl subroutine inefficient?

8 Answers8

Linked

Related