1

I'm trying to write a Moose class that parses csv files of slightly different formats with headers and return a list of objects representing the data in the files. Here's a simplified version of the code:

package MyParser;

use Moose;
use namespace::autoclean;
use Text::CSV_XS;

use MyData;  #class that represents data for each row of csv

has 'type' => ( is => 'ro', isa => 'Str', required => 1 );

sub get_data {
    my($self, $file) = @_;

    open my $fh, '<', $file || die "Can't open file $!";

    my $csv = Text::CSV_XS->new;
    $csv->column_names($csv->getline($fh));

    my @data;
    if ($self->type eq 'filetype1'){
        while (my $row = $csv->getline_hr($fh)){
            push @data, MyData->new(field1 => $row->{col1},
                                    field2 => $row->{col2},
                                    field3 => $row->{col3},
                                    );
        }
    }
    elsif ($self->type eq 'filetype2'){
        while (my $row = $csv->getline_hr($fh)){
            push @data, MyData->new(field1 => $row->{colA},
                                    field3 => _someFunction($row->{colB}), # _someFunction does some manipulation with the data
                                    field5 => $row->{colC},
                                    );
        }
    }
    elsif ($self->type eq 'filetype3'){
        while (my $row = $csv->getline_hr($fh)){
            push @data, MyData->new(field1 => $row->{column_1},
                                    field2 => _someOtherFunction($row->{column_2}),  # _someOtherFunction does some manipulation with the data
                                    field3 => $row->{column_3},
                                    field4 => $row->{column_4},
                                    field5 => $row->{column_5},
                                    );
        }
    }
    close $fh;

    return \@data;
}

__PACKAGE__->meta->make_immutable;

1;

The class MyData is just a simple data structure where some attributes have default attributes (hence the different initializations from above). Some of the csv file types also have columns that require some manipulation (e.g. a number that needs to go into a simple formula) which are file type dependent. This MyData is then returned to my main script to insert into a table in oracle.

My aim is for MyParser to handle certain specified types of csv files that can be extended if needed and return a list of MyData from get_data method. However, the method as it is now doesn't seem like an elegant/simple solution to what I'm trying to solve.

So what I'd like ask/comments on is:

Is there a better/simpler way of solving this (maybe via a design pattern such as the Factory pattern)?
Or am I trying to solve something that looks simple and making things really convoluted?

2 Answers2

1

Instead of the repeated code in the if-elsif-elsif construct it would be cleaner if you put the field mapping rules into a configuration file. For example with a data structure like this:

{
    filetype1 => {
        field1 => 'col1',
        field2 => 'col2',
        field3 => 'col3',
    },
    filetype2 => {
        field1 => 'colA',
        field3 => {
            function => sub {},
            params   => ['colB'],
        },
        field5 => 'colC',
    },
    filetype3 => {
        field1 => 'column1',
        field2 => {
            function => sub {},
            params   => ['column_2'],
        },
        field3 => 'column_3',
        field4 => 'column_4',
        field5 => 'column_5',
    },
};

Then you can replace the if-elsif-elsif construct with something like the following (assuming the mapping rules have been loaded and stored in $filetype_mappings):

while (my $row = $csv->getline_hr($fh)) {
    my %my_data = map {
        my $m = $filetype_mappings->{$_};
        $_ => ( ref $m ? &{$m->{function}}(map {$row->{$_}} @{$m->{params}})
                       : $row->{$m}
        );
    } keys %$filetype_mappings;
    push @data, MyData->new(%my_data);
}

Having the mapping rules separate should make it easy to add support for new file types or make changes to the existing ones in one place.

Snorri
  • 446
  • 2
  • 10
  • Thanks for the code example, using a config file was one of the ways I was thinking I could go down, so it's reassuring to have somebody else thinking along the same line – 1stdayonthejob Jul 30 '12 at 21:52
0

It's not a very bad idea to do it this way. Let's keep things simple!

OTOH you could create a base class for MyData which have an "abstract" method "parseData" called from the constructor. You could have say MyData, MyData, etc..., all implementing their parseData methods. Then in get_data you would simply do:

my($self, $file) = @_;

open my $fh, '<', $file || die "Can't open file $!";

my $csv = Text::CSV_XS->new;
$csv->column_names($csv->getline($fh));

my @data;
while (my $row = $csv->getline_hr($fh)){
    my $class = 'MyData'.$self->type;
    push (@data, $class->new($row));
}
close $fh;
return \@data;
Csongor Fagyal
  • 514
  • 2
  • 10
  • this was similar to one of the ways I thought about using as well, good to know I wasn't just complicating things – 1stdayonthejob Jul 30 '12 at 21:54
  • I do use factory classes in Perl, but in script languages you can often find a different solution - mostly because you don't have strong typing. Most design patterns were made for C++-like languages in mind, but when you have things like direct access to the symbol table, it's a completely different story :) – Csongor Fagyal Jul 31 '12 at 00:48