Perl: array goes empty after passing it to a function?

Question

I'm working on a project that's scalating a lot, lately, and I'm re-writing code to make it more OOP and passing all redundant code into sub-routines.

The script checks whether a gene exists in the database (through various means) or not. It may also report possible duplicates. Before reporting a duplicate, the script makes sure it's not a "biological duplicate" (essentially the same biological data but a with different position in the genome and, hence, not an actual duplicate). In order to do so...

 my @gene_ids;
 my @gene_names;                                

 while(my $gene = $geners_bychecksum->next){

        my $gene_name = $gene->gene_name;
        my $gene_id = $gene->gene_id;

        push @gene_ids, $gene_id;
        push @gene_names, $gene_name;


    }

    print STDERR "$id\tJ\tALERT CHECKSUM MULTI-HIT\t(".join(",",@gene_names).")\n"; 
    my $solve_multihit = solve_multihit($id, \@gene_names, \@gene_ids, $spc, $species_directory, $dataset);
    print STDERR "$id\tJ\tALERT CHECKSUM MULTI-HIT\t(".join(",",@gene_names).")\n"; 

    if($solve_multihit){

        print STDERR "$id\tM\tUPDATE \n";   
        print $report "$id\tM\tUPDATE \n";  
        $countM++;                                                                              

    } else {

        print STDERR "$id\tJ\tALERT CHECKSUM MULTI-HIT\t(".join(",",@gene_names).")\n"; 

    }

Here, $geners_bychecksum is a DBIC resulset with database hits from a prior search and, for this case-scenario, it always has more than 1 gene. The $id,$spc,$species_directory and $dataset are all strings that come from the config and are defined above this chunk.

The solve_multihit subroutine is a rather complicated function that tries to resolve whether the multi-hits are actual duplicates or biological duplicates. Notice that I'm passing the @gene_names and @gene_ids arrays to this function. This function will return the gene_id of the proper gene, if it was able to solve the discrepancy; or 0 if not. Simplified code for the sub can be found in the following link

https://codeshare.io/2EM8qN

THE ACTUAL QUESTION

You may have noticed that the

print STDERR "$id\tJ\tALERT CHECKSUM MULTI HIT\t(".join(",",@gene_names).")\n";

is both before and after the solve_multihit subroutine call... and the array seems to go empty after running the function, according to the STDERR:

BBOV_I005030    J   ALERT CHECKSUM MULTI-HIT    (XP_001609152.1,XP_001609157.1)
BBOV_I005030    J   ALERT CHECKSUM MULTI-HIT    ()
BBOV_I005040    J   ALERT CHECKSUM MULTI-HIT    (XP_001609156.1,XP_001609153.1)
BBOV_I005040    J   ALERT CHECKSUM MULTI-HIT    ()
BBOV_I005050    J   ALERT CHECKSUM MULTI-HIT    (XP_001609154.1,XP_001609155.1)
BBOV_I005050    J   ALERT CHECKSUM MULTI-HIT    ()
BBOV_I005060    J   ALERT CHECKSUM MULTI-HIT    (XP_001609154.1,XP_001609155.1)
BBOV_I005060    J   ALERT CHECKSUM MULTI-HIT    ()
BBOV_I005070    J   ALERT CHECKSUM MULTI-HIT    (XP_001609156.1,XP_001609153.1)
BBOV_I005070    J   ALERT CHECKSUM MULTI-HIT    ()
BBOV_I005080    J   ALERT CHECKSUM MULTI-HIT    (XP_001609152.1,XP_001609157.1)
BBOV_I005080    J   ALERT CHECKSUM MULTI-HIT    ()

Why would that happen? I'm pretty sure I could solve it by returning the arrays along with the results of the solve_multihit{} sub, but I wonder why would it go empty.

PS: The J in the report is just a case-scenario key code.

There's obviously something in `solve_multihit()` which modifies the input arrays (definitely `@gene_names`, possibly also `@gene_ids`), but we can only guess at what that "something" is without seeing what's in the sub. Is it maybe using `shift` or `pop` to access the contents of the array(s)? — Dave Sherohman, Aug 23 '17 at 17:17
I didn't know I could be modifying the imput! I'll add the sub to the post. Thanks — Leitouran, Aug 23 '17 at 17:21
@zdim, Re "*The function arguments are aliased, and if you change @_ you change them.*", Changing `@_` wouldn't have the decribed effect. Changing `@{ $_[1] }` would. — ikegami, Aug 23 '17 at 17:27
@ikegami Right, thank you. It was a rushed and clumsy "fix" of an initial comment which was imprecise to the point of being wrong. I'm removing the comment. thank you — zdim, Aug 23 '17 at 17:52

ikegami · Answer 1 · 2017-08-24T03:34:28.170

my @gene_names = splice(shift);
my @gene_ids   = splice(shift);

is short for

my @gene_names = splice(@{ shift(@_) });
my @gene_ids   = splice(@{ shift(@_) });

splice(@a) empties the array and returns its contents. There's no reason to do that! The above should be

my @gene_names = @{ shift(@_) };
my @gene_ids   = @{ shift(@_) };

Honestly, there's no need to make a copy of the array. Just use the provided reference.

my $gene_names = shift;
my $gene_ids   = shift;

I'd provide a fixed-up version of solve_multihit, but it has numerous major problems I can't fix with the information I have.

zdim · Accepted Answer · 2017-08-25T08:58:15.277

1

I can see two ways for your code to accomplish the data removal that it seems to be doing.

The function arguments available in @_ are aliased to data passed to it. So if you change @_ itself (or its elements) you change the data outside of the function.

More likely, as you are passing by reference, your sub probably works directly with it

sub ff {
    my ($rary) = @_;
    @$rary = ();
}

my @data = 1..4;

ff(\@data);

say for @data;  # empty

If your processing needs to change the array it works with then make a local copy first

sub ff { 
    my ($rary) = @_;
    my @local_ary = @$ary;
    # now changes to @local_ary do not affect @data in the caller
}

This is generally safer, while it does introduce a data copy which doesn't happen when working with the reference.

The edit together with ikegami's answer clears this up: splice is destructive to the array it works with and here by curious syntax it's fed an anonymous array formed out of a dereferenced @_ argument, whereby it changes the data in the caller.

There is no reason for splice in what you do. Its purpose is to change the array.

Instead, use arrayrefs that are passed to the sub

sub solve_multihit {
    my ($id, $gene_names, $gene_ids, ...) = @_;
    foreach my $name (@$gene_names) {
        ...
    }
    ...
}

or make a local copy if you wish

sub solve_multihit { 
    my $id = shift;
    my @gene_names = @{ shift @_ };
    ...
}

where my @gene_names is a lexical variable in this scope (the sub in your case ) and changes to it do not affect the one with the same name in the calling scope.

edited Aug 25 '17 at 08:58

answered Aug 23 '17 at 17:27

zdim

64,580
5
52
81

Thanks for the quick response but I missing something... How can I access the elements in $gene_names after storing it like that? a quick debug over the @_ values shows this: foreach my $var (values @_){ print STDERR $var."\n"; } BBOV_I001860 | ARRAY(0x3fa6180) <--- gene_names | ARRAY(0x3fa5898) <--- gene_ids | bbo | bbo | omcl | – Leitouran Aug 23 '17 at 18:11
1

@LionelUranLandaburu Then you work with an _arrayref_. You can access an element by `$$arrayref[0]` or by `$arrayref->[0]` (better when code is more complex). You can iterate by, say, `for my $elem (@$arrayref) { ... }`. Etc. I'd suggest to _not use `@_` directly_. Just say 'no'. – zdim Aug 23 '17 at 20:46
1

@LionelUranLandaburu Btw, the `values` that you use had been intended to return values of a hash, and had until v5.12 only worked with a hash. Using it with an array may cause confusion. More importantly, if you want values of an array it's enough to say `foreach my $var (@array)`. – zdim Aug 23 '17 at 20:52
1

@LionelUranLandaburu When you do `\@data` you take a _reference_ to the array, getting a scalar variable much like a pointer. That reference is what you pass to your sub. Then you need to _dereference_ it in order to access the elements, and my comment above is about that. See [perfreftut](http://perldoc.perl.org/perlreftut.html) – zdim Aug 23 '17 at 21:24
1

@LionelUranLandaburu Added an example of how to access arrayref elements. – zdim Aug 24 '17 at 08:11

Perl: array goes empty after passing it to a function?

2 Answers2