4

Is there any way to force Perl to call FETCHSIZE on a tied array before each call to FETCH? My tied array knows its maximum size, but could shrink from this size depending on the results of earlier FETCH calls. here is a contrived example that filters a list to only the even elements with lazy evaluation:

use warnings;
use strict;

package VarSize;

sub TIEARRAY { bless $_[1] => $_[0] }
sub FETCH {
    my ($self, $index) = @_;
    splice @$self, $index, 1 while $$self[$index] % 2;
    $$self[$index]
}
sub FETCHSIZE {scalar @{$_[0]}}

my @source = 1 .. 10;

tie my @output => 'VarSize', [@source];

print "@output\n";  # array changes size as it is read, perl only checks size
                    # at the start, so it runs off the end with warnings
print "@output\n";  # knows correct size from start, no warnings

for brevity I have omitted a bunch of error checking code (such as how to deal with accesses starting from an index other than 0)

EDIT: rather than the above two print statements, if ONE of the following two lines is used, the first will work fine, the second will throw warnings.

print "$_ " for @output;   # for loop "iterator context" is fine,
                           # checks FETCHSIZE before each FETCH, ends properly

print join " " => @output; # however a list context expansion 
                           # calls FETCHSIZE at the start, and runs off the end

Update:

The actual module that implements a variable sized tied array is called List::Gen which is up on CPAN. The function is filter which behaves like grep, but works with List::Gen's lazy generators. Does anyone have any ideas that could make the implementation of filter better?

(the test function is similar, but returns undef in failed slots, keeping the array size constant, but that of course has different usage semantics than grep)

Eric Strom
  • 39,821
  • 2
  • 80
  • 152
  • 1
    Add a call to FETCHSIZE at the start of the FETCH sub? – Anon. Dec 21 '09 at 20:49
  • adding a call to FETCHSIZE inside FETCH will get me the size there, but it won't stop the warnings that the first `print` statement generates. the trick is how do we get Perl to call FETCHSIZE as it iterates through the array – Eric Strom Dec 21 '09 at 21:04
  • I'm pretty sure that the answer is "if you have a FETCH that removes elements, using the array as a list will blow up and you can't get around that." – hobbs Dec 21 '09 at 21:28
  • @hobbs => that's what is happening now, i am hoping that there is some workaround (like setting some magic value that tells Perl to treat the array as if it was in a `for` loop, rather than an argument list). – Eric Strom Dec 21 '09 at 21:36
  • 2
    @Eric I think you need to talk more about what you're really trying to accomplish here. Because what you're asking for is painful to make work and breaks the expectations about arrays. – Schwern Dec 22 '09 at 04:57

2 Answers2

1
sub FETCH {
    my ($self, $index) = @_;
    my $size = $self->FETCHSIZE;
    ...
}

Ta da!

I suspect what you're missing is they're just methods. Methods called by tie magic, but still just methods you can call yourself.

Listing out the contents of a tied array basically boils down to this:

my @array;
my $tied_obj = tied @array;
for my $idx (0..$tied_obj->FETCHSIZE-1) {
    push @array, $tied_obj->FETCH($idx);
}

return @array;

So you don't get any opportunity to control the number of iterations. Nor can FETCH reliably tell if its being called from @array or $array[$idx] or @array[@idxs]. This sucks. Ties kinda suck, and they're really slow. About 3 times slower than a normal method call and 10 times than a regular array.

Your example already breaks expectations about arrays (10 elements go in, 5 elements come out). What happen when a user asks for $array[3]? Do they get undef? Alternatives include just using the object API, if your thing doesn't behave exactly like an array pretending it does will only add confusion. Or you can use an object with array deref overloaded.

So, what you're doing can be done, but its difficult to get it to work well. What are you really trying to accomplish?

Schwern
  • 153,029
  • 25
  • 195
  • 336
  • that will get the size for use inside of FETCH, but the first call to `print` in my example will still throw warnings with this change. i tried returning a tied scalar from FETCHSIZE, but Perl still only calls it once at the start of the join – Eric Strom Dec 21 '09 at 21:08
  • 1
    The idea of changing the contents of an array on read is generally a bad idea, but I can't get more specific without knowing what it is you're really trying to do. Its difficult for a tied array to know if an individual index has been accessed or the whole thing (it would be a lot more efficient if there was a single method for the whole thing) or just a slice. Honestly, I would change it on store. – Schwern Dec 21 '09 at 22:44
  • i agree that its a suboptimal solution, i am working on an lazy version of grep that returns a tied array that only performs the minimum number of element tests to accurately return its result. for example, you have a list of a thousand items, apply the lazy grep to it, and then take a slice of the result from 2 .. 5, the tied array will walk through its source until there are 6 matching elements and then return the last 4. any suggestions of a different way to approach the problem? – Eric Strom Dec 21 '09 at 23:10
  • 2
    @Eric Return an iterator object. It'll be a lot less painful and efficient than dealing with tied arrays. Instead of `for my $thing (@list)` it'll be `while( my $thing = $iter->next )`. If you still want to go for tied arrays look at Tie::File for inspiration, it solves a similar problem. – Schwern Dec 22 '09 at 05:02
  • my module already implements a `->next` style iterative interface, and for other functions the tied interface is working well, its just the grep implementation that isn't working as cleanly as i would like. I have an early version up on cpan: List::Gen – Eric Strom Dec 22 '09 at 17:18
  • So you've reimplemented `Data::Stream::Bulk`? – jrockway Dec 24 '09 at 18:49
  • @jrockway => the core ideas have similarities, but the implementation is very different, for example, `List::Gen` provides lazy evaluation with random access within its objects – Eric Strom Feb 04 '10 at 00:10
0

I think that order in which perl calls FETCH/FETCHSIZE methods can't be changed. It's perls internal part. Why not just explicitly remove warnings:

sub FETCH {
    my ($self, $index) = @_;
    splice @$self, $index, 1 while ($$self[$index] || 0) % 2;
    exists $$self[$index] ? $$self[$index] : '' ## replace '' with default value
}
Ivan Nevostruev
  • 28,143
  • 8
  • 66
  • 82
  • that does work for the case of only printing the values, but i am worried about that placeholder leading to action at a distance when a copied array silently ends up with empty strings at the end. of course the explicit solution is if you need the whole array, to just to ask the array for a value past the end, forcing it to calculate all the relevant values first, but i am hoping for an implicit solution – Eric Strom Dec 21 '09 at 23:03
  • If you were worried about action at a distance, reading an array would not cause its length to change. – jrockway Dec 24 '09 at 18:41
  • @jrockway => I agree, the behavior of this one function is an anomaly in my module, hence the question :) – Eric Strom Feb 04 '10 at 00:12