best practices when using Perl modules

Question

i'm basically new to modules and i'm trying to use them in my scripts. i am having trouble finding the right way of using them properly and i'd like your advice about it.

let me explain quickly what i'm trying to do :

my script is doing some file transfers, based on data from XML files.

so basically, i have XML files with contents like that :

<fftg>
    <actions>

            <!-- Rename file(s) -->
            <rename>
                    <mandatory>0</mandatory>
                    <file name="foo" to="bar" />
            </rename>

            <!-- Transfer file(s) -->
            <transfer>
                    <mandatory>0</mandatory>
                    <protocol>SFTP</protocol>
                    <server>fqdn</server>
                    <port>22</port>
                    <file name="bar" remotefolder="toto" />
            </transfer>

            <!-- Transfer file(s) -->
            <transfer>
                    <mandatory>0</mandatory>
                    <protocol>SFTP</protocol>
                    <server>fqdn</server>
                    <port>22</port>
                    <file name="blabla" remotefolder="xxxx" />
                    <file name="blabla2" remotefolder="xxxx" />
            </transfer>

    </actions>
</fftg>

in a few words, i have a script performing "actions". every action can be repeated X times.

now, instead of having an important script with a bunch of subroutines etc.. i think it should be better to create modules for my app, and put the actions in modules.

for example :

FFTG::Rename
FFTG::Transfer
FFTG::Transfer::SFTP
FFTG::Transfer::FTP

& so on (i've created all these modules and they work fine independently)

and call these modules depending on the actions specified in the XML file. people could create new modules/actions if required (i want the thing to be modular).

now, i don't know how to do this properly.

so my question is : what is the best way to do this please ?

currently, my script is reading these actions like that :

# Load XML file
my $parser = XML::LibXML->new();
my $doc    = $parser->parse_file($FFTG_TSF . "/" . $tid . ".xml");

# Browse XML file
foreach my $transfer ($doc->findnodes('/fftg')) {

    # Grab generic information
    my($env) = $transfer->findnodes('./environment');
    my($desc) = $transfer->findnodes('./description');
    my($user) = $transfer->findnodes('./user');
    print $env->to_literal, "\n";

    # Browse Actions
    foreach my $action ($doc->findnodes('/fftg/actions/*')) {

            my $actiontype = ucfirst($action->nodeName());
            # how do i select a module from the $actiontype here ?     ($actiontype = Rename or Transfer)
            # i can't do : use FFTG::$actiontype::execaction(); or something for example, it doesnt work
            # and is it the right way of doing it ? 

    }
}

but maybe it's not the right way of thinking it. (i'm using Lib::LibXML) how can i call the module "dynamically" (using a variable in the name, such as FFTG::$actiontype for example and also, does it mean that i have to have the same subroutine in every module ? example : sub execaction

as i want to send differnt data to the module......

any hints ? thanks again regards,

Are the modules object oriented? Is there always the same function in it, like `do` or `execute` or `run` or something similar? Can you please [edit] and include an example module? — simbabque, May 23 '17 at 08:21
well that's a good question, and that's what I'm asking at the end, should I create OO modules or regular ones, and should I use the same kind of subroutine in every module ? (otherwise it's going to be hard to automatically call the module, as every module is doing different stuff). I can edit my post and post a module, but it's not an action-module — olivierg, May 23 '17 at 08:25
It's important that you define an interface. I'm going to write up an answer, but it could take a while because it might be long. — simbabque, May 23 '17 at 08:26
ah that's interesting, thanks again for your help. I suppose I can create an OO module with an execute() subroutine like you said, but then i'll have to pass all parameters to it, or find a way to read the XML from the module, and every module — olivierg, May 23 '17 at 08:30

score 6 · Accepted Answer · answered May 23 '17 at 09:08

First, you need to come up with a clear interface. Every module needs to have the same structure. It doesn't matter if it's OOP or not, but they all need to expose the same interface.

Here's an example not-OOp implementation of FFTG::Rename. I've left out a lot of stuff, but I think it's clear what is happening.

package FFTG::Rename;

use strict;
use warnings;

sub run {
    my ($args) = @_;

    if ($args->{mandatory}) {
        # do stuff here
    }

    # checks args...
    # do sanity checks...
    return unless -f $args->{file}->{name}; # or whatever...

    rename $args->{file}->{name}, $args->{file}->{to} or die $!;

    return; # maybe return someting meaningful?
}

Now let's assume we have a bunch of those. How do we load them? There are several ways to do this. I have omitted the part of getting the arguments into the run function. You'll need to take the stuff from the XML and pass it along in a way that's identical to all of those functions, but I think that's not relevant to the question.

Load all of them

The most obvious is to load all of them in your script manually.

#!/usr/bin/env perl
use strict;
use warnings;
use XML::LibXML;

# load FFTG modules
use FFTG::Rename;
# ...

Once they are loaded, you can call the function. The exist keyword is handy because it can also be used to check if a function exists.

foreach my $action ( $doc->findnodes('/fftg/actions/*') ) {
    my $actiontype = ucfirst( $action->nodeName );
    no strict 'refs';
    if ( exists &{"FFTG::${actiontype}::run"} ) {
        &{"FFTG::${actiontype}::run"}->( $parsed_node_information );
    } else {
        # this module was not loaded
    }
}

Unfortunately the non-OO approach requires the no strict 'refs', which is not pretty. It's probably better to do it in an object-oriented fashion. But I'll stick with this for the answer.

The clear downside of this way is that you need to load all of the modules all the time, and whenever a new one is created, it needs to be added. This is the least complex, way, but also has the highest maintenance.

Automatic loading with a lookup table

Another way is to use automatic loading and a lookup table that defines actions that are allowed. If you want your program to only load the modules on demand because you know that you don't need all of them in every invocation, but you also want to have control over what gets loaded, this makes sense.

Instead of loading all of them, the loading can be outsourced to Module::Runtime.

use Module::Runtime 'require_module';
use Try::Tiny;

my %modules = (
    'rename' => 'FFTG::Rename',

    # ...
);

foreach my $action ( $doc->findnodes('/fftg/actions/*') ) {
    try {
        no strict 'refs';
        require_module $modules{$action};
        &{"FFTG::${actiontype}::run"}->($parsed_node_information);
    }
    catch {
        # something went wrong
        # maybe the module does not exist or it's not listed in the lookup table
        warn $_;
    };
}

I've also added Try::Tiny to take care of error handling. It gives you control over what to do when stuff goes wrong.

This approach lets you control what actions are allowed, which is good if you're paranoid. But it still requires you to maintain the script and add new modules to the %modules lookup table.

Trusting and loading dynamically

A third, most generic approach would be to use Module::Runtime to load stuff dynamically without the lookup table.

use Module::Runtime 'require_module';
use Try::Tiny;

foreach my $action ( $doc->findnodes('/fftg/actions/*') ) {
    try {
        my $actiontype = ucfirst($action->nodeName);
        require_module "FFTG::${actiontype}";

        no strict 'refs';
        &{"FFTG::${actiontype}::run"}->($parsed_node_information);
    }
    catch {
        # something went wrong
        # the module does not exist
    };
}

This has the least maintenance, but it's a bit more dangerous. You don't know what data is coming in, and now there is no sanity check. I can't think of a way to exploit this of the top of my head, but there could be one. Still, now no editing the script and keeping a module list up to date is required.

Conclusion

I would probably go with the second approach. It gives you control and still keeps stuff dynamic. I would not go with the non-OOP approach I have used.

You could keep it non-OOP and still get rid of the no strict 'refs' by using the -> object notation to call class methods. Then your package would look like this.

package FFTG::Rename;

use strict;
use warnings;

sub run {
    my (undef, $args) = @_;

    # ...
}

The undef is to not capture $class (not $self), because we don't need it. Or maybe we do, for logging. It depends. But with this, you could essentially call the class method as follows for the lookup table solution.

require_module $modules{$action};
$modules{$action}->run($parsed_node_information);

This is obviously way clearer and preferable.

thanks again for this clear explanation, it makes perfect sense ! I will make some tests this afternoon and answer you afterwards (and mark the question as resolved) ! thanks again for the time it took for you to write this — olivierg, May 23 '17 at 11:14
just a quick question before I answer, you wrote my ($args) = @_; in the subroutine, but is there a best way of sending the arguments (contents of the action) to the sub ? I suppose I have to grab them all in a hash, and send them to run(%hash) ? — olivierg, May 24 '17 at 09:04
my bad, I just noticed the $parsed_node_information, I suppose I have to fill this variable with the action info — olivierg, May 24 '17 at 09:05
@olivierg yes. I figured since you're already deep into the whole XML stuff, you will know how to use LibXML to do that. I would have to look it up as well, and it's not important for the concepts, so I ddin't do it. I think a simple hashref with key/value pairs a la XML::Simple (but don't use that!!!11elf) should work. I think abstracting it so you don't pass in the actual `$node` from the XML parser makes sense because that way you can also change the delivery format from XML to e.g. YAML or plain text or whatever you want, but your modules remain the same. — simbabque, May 24 '17 at 09:08
i get it, thanks again (and indeed that wasn't the goal of my question, maybe i'll create a second topic for that if I'm not able to do it, as I'm new to libxml as well, I'm testing your solution right now (will do it without args for now) and i'll mark it as resolved if everything is ok, thanks again for your help ! — olivierg, May 24 '17 at 09:18

best practices when using Perl modules

1 Answers1

Load all of them

Automatic loading with a lookup table

Trusting and loading dynamically

Conclusion