Perl: Removing array items and resizing the array

Question

I’m trying to filter an array of terms using another array in Perl. I have Perl 5.18.2 on OS X, though the behavior is the same if I use 5.010. Here’s my basic setup:

#!/usr/bin/perl
#use strict;
my @terms = ('alpha','beta test','gamma','delta quadrant','epsilon',
             'zeta','eta','theta chi','one iota','kappa');
my @filters = ('beta','gamma','epsilon','iota');
foreach $filter (@filters) {
    for my $ind (0 .. $#terms) {
        if (grep { /$filter/ } $terms[$ind]) {
            splice @terms,$ind,1;
        }
    }
}

This works to pull out the lines that match the various search terms, but the array length doesn’t change. If I write out the resulting @terms array, I get:

[alpha]
[delta quadrant]
[zeta]
[eta]
[theta chi]
[kappa]
[]
[]
[]
[]

As you might expect from that, printing scalar(@terms) gets a result of 10.

What I want is a resulting array of length 6, without the four blank items at the end. How do I get that result? And why isn’t the array shrinking, given that the perldoc page about splice says, “The array grows or shrinks as necessary.”?

(I’m not very fluent in Perl, so if you’re thinking “Why don’t you just...?”, it’s almost certainly because I don’t know about it or didn’t understand it when I heard about it.)

`grep` operates on arrays and returns matching elements. Maybe you mean `$terms[$ind] =~ /$filter/` to match a single one? — tadman, Dec 11 '16 at 21:40
Yep, looks like that works as intended—thanks! I’m still confused about why the array didn’t shrink with what I was doing before. — Eric A. Meyer, Dec 11 '16 at 21:48
It's always tricky to remove elements from an array you're actively iterating over. That shifts the offset by 1 each time you splice something out. — tadman, Dec 11 '16 at 21:49
FWIW, [`use VERSION`](http://perldoc.perl.org/functions/use.html) only specifies the _minimum_ version needed; it doesn't emulate the Perl interpreter as it existed at that version. — Matt Jacob, Dec 12 '16 at 00:05

score 7 · Accepted Answer · edited Dec 12 '16 at 02:15

7

You can always regenerate the array minus things you don't want. grep acts as a filter allowing you to decide which elements you want and which you don't:

#!/usr/bin/perl

use strict;

my @terms = ('alpha','beta test','gamma','delta quadrant','epsilon',
           'zeta','eta','theta chi','one iota','kappa');
my @filters = ('beta','gamma','epsilon','iota');

my %filter_exclusion = map { $_ => 1 } @filters;

my @filtered = grep { !$filter_exclusion{$_} } @terms;

print join(',', @filtered) . "\n";

It's pretty easy if you have a simple structure like %filter_exclusion on hand.

Update: If you want to allow arbitrary substring matches:

my $filter_exclusion = join '|', map quotemeta, @filters;

my @filtered = grep { !/$filter_exclusion/ } @terms;

edited Dec 12 '16 at 02:15

ikegami

367,544
15
269
518

answered Dec 11 '16 at 21:47

tadman

208,517
23
234
262

That one only partly works—it filters out `gamma` and `epsilon`, but not `beta test` or `one iota`. Useful to have on hand for future projects, though! – Eric A. Meyer Dec 11 '16 at 22:00
Added a version that tests arbitrary substrings. This one uses a regular expression again, but just one test per entry, not N tests. – tadman Dec 11 '16 at 22:07
Cool, thanks! That does indeed work. Mind you, I have no idea whatsoever how or why it works. – Eric A. Meyer Dec 12 '16 at 02:33
`grep` acts like a pass-fail filter on each element in `@terms` here, so for a given `$_` from `@terms` it tests if it matches that pattern or not. The pattern is just a regular expression that matches any one of them as substrings. – tadman Dec 12 '16 at 15:35

score 0 · Answer 2 · answered Dec 11 '16 at 22:05

To see what's going on, print the contents of the array in each step: When you splice the array, it shrinks, but your loop iterates over 0 .. $#terms, so at the end of the loop, $ind will point behind the end of the array. When you use grep { ... } $array[ $too_large ], Perl needs to alias the non-existent element to $_ inside the grep block, so it creates an undef element in the array.

#!/usr/bin/perl
use warnings;
use strict;
use feature qw{ say };

my @terms = ('alpha', 'beta test', 'gamma', 'delta quadrant', 'epsilon',
             'zeta', 'eta', 'theta chi', 'one iota', 'kappa');
my @filters = qw( beta gamma epsilon iota );

for my $filter (@filters) {
    say $filter;
    for my $ind (0 .. $#terms) {
        if (grep { do {
            no warnings 'uninitialized';
            /$filter/
        } } $terms[$ind]
        ) {
            splice @terms, $ind, 1;
        }
        say "\t$ind\t", join ' ', map $_ || '-', @terms;
    }
}

If you used $terms[$ind] =~ /$filter/ instead of grep, you'd still get uninitialized warnings, but as there's no need to alias the element, it won't be created.

@ikegami: I don't see `gamma` in the output. Moreover, this is not a "fix", it should only demostrate WHY and WHEN the trailing elements are created - therefore, they're still there. — choroba, Dec 12 '16 at 08:19
@ikegami: If I `print "@terms"`, I see `alpha delta quadrant zeta eta theta chi kappa`. — choroba, Dec 12 '16 at 12:53
Oh sorry, the bug happens if you start with `@terms = qw( gamma gamma kappa );`. The second gamma gets moved into `$terms[0]`, which isn't revisited. — ikegami, Dec 12 '16 at 13:08
@ikegami: True, you're right. But I was just trying to explain why the undefs exist. — choroba, Dec 12 '16 at 13:17

Perl: Removing array items and resizing the array

2 Answers2