perl class to parse specified csv file types of similar data with different columns

Question

I'm trying to write a Moose class that parses csv files of slightly different formats with headers and return a list of objects representing the data in the files. Here's a simplified version of the code:

package MyParser;

use Moose;
use namespace::autoclean;
use Text::CSV_XS;

use MyData;  #class that represents data for each row of csv

has 'type' => ( is => 'ro', isa => 'Str', required => 1 );

sub get_data {
    my($self, $file) = @_;

    open my $fh, '<', $file || die "Can't open file $!";

    my $csv = Text::CSV_XS->new;
    $csv->column_names($csv->getline($fh));

    my @data;
    if ($self->type eq 'filetype1'){
        while (my $row = $csv->getline_hr($fh)){
            push @data, MyData->new(field1 => $row->{col1},
                                    field2 => $row->{col2},
                                    field3 => $row->{col3},
                                    );
        }
    }
    elsif ($self->type eq 'filetype2'){
        while (my $row = $csv->getline_hr($fh)){
            push @data, MyData->new(field1 => $row->{colA},
                                    field3 => _someFunction($row->{colB}), # _someFunction does some manipulation with the data
                                    field5 => $row->{colC},
                                    );
        }
    }
    elsif ($self->type eq 'filetype3'){
        while (my $row = $csv->getline_hr($fh)){
            push @data, MyData->new(field1 => $row->{column_1},
                                    field2 => _someOtherFunction($row->{column_2}),  # _someOtherFunction does some manipulation with the data
                                    field3 => $row->{column_3},
                                    field4 => $row->{column_4},
                                    field5 => $row->{column_5},
                                    );
        }
    }
    close $fh;

    return \@data;
}

__PACKAGE__->meta->make_immutable;

1;

The class MyData is just a simple data structure where some attributes have default attributes (hence the different initializations from above). Some of the csv file types also have columns that require some manipulation (e.g. a number that needs to go into a simple formula) which are file type dependent. This MyData is then returned to my main script to insert into a table in oracle.

My aim is for MyParser to handle certain specified types of csv files that can be extended if needed and return a list of MyData from get_data method. However, the method as it is now doesn't seem like an elegant/simple solution to what I'm trying to solve.

So what I'd like ask/comments on is:

Is there a better/simpler way of solving this (maybe via a design pattern such as the Factory pattern)?
Or am I trying to solve something that looks simple and making things really convoluted?

Snorri · Accepted Answer · 2012-07-30T09:22:48.940

Instead of the repeated code in the if-elsif-elsif construct it would be cleaner if you put the field mapping rules into a configuration file. For example with a data structure like this:

{
    filetype1 => {
        field1 => 'col1',
        field2 => 'col2',
        field3 => 'col3',
    },
    filetype2 => {
        field1 => 'colA',
        field3 => {
            function => sub {},
            params   => ['colB'],
        },
        field5 => 'colC',
    },
    filetype3 => {
        field1 => 'column1',
        field2 => {
            function => sub {},
            params   => ['column_2'],
        },
        field3 => 'column_3',
        field4 => 'column_4',
        field5 => 'column_5',
    },
};

Then you can replace the if-elsif-elsif construct with something like the following (assuming the mapping rules have been loaded and stored in $filetype_mappings):

while (my $row = $csv->getline_hr($fh)) {
    my %my_data = map {
        my $m = $filetype_mappings->{$_};
        $_ => ( ref $m ? &{$m->{function}}(map {$row->{$_}} @{$m->{params}})
                       : $row->{$m}
        );
    } keys %$filetype_mappings;
    push @data, MyData->new(%my_data);
}

Having the mapping rules separate should make it easy to add support for new file types or make changes to the existing ones in one place.

Thanks for the code example, using a config file was one of the ways I was thinking I could go down, so it's reassuring to have somebody else thinking along the same line — 1stdayonthejob, Jul 30 '12 at 21:52

score 0 · Answer 2 · answered Jul 29 '12 at 23:35

0

It's not a very bad idea to do it this way. Let's keep things simple!

OTOH you could create a base class for MyData which have an "abstract" method "parseData" called from the constructor. You could have say MyData, MyData, etc..., all implementing their parseData methods. Then in get_data you would simply do:

my($self, $file) = @_;

open my $fh, '<', $file || die "Can't open file $!";

my $csv = Text::CSV_XS->new;
$csv->column_names($csv->getline($fh));

my @data;
while (my $row = $csv->getline_hr($fh)){
    my $class = 'MyData'.$self->type;
    push (@data, $class->new($row));
}
close $fh;
return \@data;

answered Jul 29 '12 at 23:35

Csongor Fagyal

514
2
10

this was similar to one of the ways I thought about using as well, good to know I wasn't just complicating things – 1stdayonthejob Jul 30 '12 at 21:54
I do use factory classes in Perl, but in script languages you can often find a different solution - mostly because you don't have strong typing. Most design patterns were made for C++-like languages in mind, but when you have things like direct access to the symbol table, it's a completely different story :) – Csongor Fagyal Jul 31 '12 at 00:48

perl class to parse specified csv file types of similar data with different columns

2 Answers2