2

I am having difficulty getting my head around this considering variable declarations. Scenario: I have a file with ten words, one per line. First I want to loop through the file and create new files based on the data. Example

banana
apple
coconut
strawberry

-->

banana.txt
apple.txt
coconut.txt
strawberry.txt

The first problem that I'm having is: how do I assign a unique variable for the file handle for each file in the loop? I would write something like this but I don't know if that's the way to go:

open(my $tokensfh, '<', $tokensfile)
  or die "cannot open file $tokensfile";
chomp(my @tokenslines = <$tokensfh>);
close $tokensfh;

foreach my $token(@tokenslines) {
  open(my $token.'fh', '>>', $token."data.txt");
} 

A bit further down the line I match other data against the $token, but I'm unsure how to deal with the variables:

foreach my $somedata(@data) {
    my $datatoken = $somedata=~  /<fruit>(.+)<\/fruit>/;

    # Do I need a new variable name here?
    foreach my $tokensline(@tokenslines) {
        if ($datalinetoken eq $datatoken ) {
          # print $somedata to specific file
          print $tokensline.'fh' "average run time\n";
        }
    }
}

Do I need a new variable name? If not, how can I re-use the earlier variable without getting variable assignment issues? Is there a better way to do this? (Please answer all questions.)

Bram Vanroy
  • 27,032
  • 24
  • 137
  • 239
  • `my %fh; foreach my $token(@tokenslines) { open(my $fh{$token}, '>>', $token."data.txt") or die "A horrible death"; }`? – Jonathan Leffler Mar 19 '16 at 19:23
  • A clean solution might come with a good OO flavouring thrown into the kettle ;-) – laune Mar 19 '16 at 19:31
  • Oh, and some sample data would be good - this looks like XML, so an XML parser is recommended. – Sobrique Mar 19 '16 at 20:34
  • Please *always* `use strict` and `use warnings 'all'`. Your code would throw several warnings telling you that your `$token` variable is uninitialized, with the result that you are reopening the file `data.txt` for output multiple times on the file handle `fh` – Borodin Mar 19 '16 at 20:39
  • @Borodin I did use those two, but seems to clutter the core to the problem here so I didn't include it in the question. I typed the example from scratch and didn't test it because I was just throwing out the general idea. – Bram Vanroy Mar 19 '16 at 21:07
  • @Sobrique I was typing up some example for the sake of this question, I didn't test this code nor is the data representative. I am not working with XML but with tab separated data. – Bram Vanroy Mar 19 '16 at 21:08

2 Answers2

3

Don't do this. It's very nasty to use a variable variable name. See this link for a more detailed explanation of why: http://perl.plover.com/varvarname.html

You're much better off if you need named filehandles, to use a hash of filehandles. A hash is a portable namespace which is exactly what you need here.

So:

my %fh_for; 
foreach my $token ( @tokenlines ) { 
   open ( my $fh_for{$token}, '>', "$token.txt" ) or die $!; 
}

foreach my $datalinetoken (@tokenslines) {
    if ($datalinetoken eq $datatoken ) {
      # print $somedata to specific file
      print {$fh_for{$datalinetoken}} "average run time\n";
    }
}

Then you can write to a filehandle keyed by your token name, without needing the icky messyness of dynamic variable naming. Note, I've included your fh in {} - it's necessary to tell perl to 'evaluate this'.

Sobrique
  • 52,974
  • 7
  • 60
  • 101
  • From [print](http://perldoc.perl.org/functions/print.html), it says: _If you're storing handles in an array or hash, or in general whenever you're using any expression more complex than a bareword handle or a plain, unsubscripted scalar variable to retrieve it, you will have to use a block returning the filehandle value instead, in which case the LIST may not be omitted:_. So, I think this says the block is mandatory. – Chris Charley Mar 19 '16 at 20:06
  • So, even though $token and $datatoken would be the same thing, we cannot use the same variable name? (Note that you forgot the embedded for loop.) – Bram Vanroy Mar 19 '16 at 20:08
  • Yeah, good point. I wasn't sure - I know I've hit places where it was, I just couldn't remember if this was one. – Sobrique Mar 19 '16 at 20:08
  • You can use the same variable name if you wish. I just used what you were using. As long as the variable contains 'coconut' it will print to the coconut filehandle. You should probably include an 'or warn' after the print, to make sure it does work. – Sobrique Mar 19 '16 at 20:12
  • *"It's very nasty to use a variable variable name"* It's actually *wrong* here because `open my $token.'fh', '>>', $token."data.txt"` will declare a new `$token` set to `undef`, and then do `open 'fh', '>>', 'data'`. It will warn about the uninitialised value in each case of course – Borodin Mar 19 '16 at 20:38
3

You can use the same global variable name repeatedly as long as they are declared in different scopes. Perl will warn you if you declare the same variable twice. I have used the same name $fh for a file handle in my code below without any consequences

In this case you need the file handles to be opened for most of the program, so you need a whole set of them, and it looks like it's easiest to use a hash, so that you can just pick the correct file handle by indexing the hash with the token string

It would look something like this. Note that I've used use autodie to avoid having to check the status of every IO operation explicitly. You may also want to consider whether you will need to handle the difference between apple, APPLE and Apple, which at the moment will create three file handles (and would confuse Windows dreadfully!)

Oh, and by the way, it's far nicer just to process each file line by line with a while instead of reading it all into an array and processing the data from there

use strict;
use warnings 'all';
use v5.14.1; # For autodie
use autodie;

use constant TOKENS_FILE => 'tokens.txt';
use constant XML_FILE    => 'data.xml';

my %token_fh;

{
    open my $fh, '<', TOKENS_FILE;

    while ( <$fh> ) {
        chomp;
        open $token_fh{$_}, '>', "${_}data.txt";
    }
}

{
    open my $fh, '<', XML_FILE;

    while ( <$fh> ) {

        next unless my ($token) = m|<fruit>(.+)</fruit>|;
        next unless my $fh = $token_fh{$token};

        print $fh "average run time\n";
    }
}

close $_ for values %token_fh;



An alternative way would be to forget about the tokens file altogether, and just open files as they are encountered in the XML. That would look like this

use strict;
use warnings 'all';
use v5.14.1; # For autodie
use autodie;

use constant XML_FILE    => 'data.xml';

my %token_fh;

open my $fh, '<', XML_FILE;

while ( <$fh> ) {

    next unless my ($token) = m|<fruit>(.+)</fruit>|;

    unless ( exists $token_fh{$token} ) {
        open $token_fh{$token}, '>', "${token}data.txt";
    }
    my $fh = $token_fh{$token};

    print $fh "average run time\n";
}

close $_ for values %token_fh;
Borodin
  • 126,100
  • 9
  • 70
  • 144