1

Below is a code that dumps an SQL query to an aligned text table format.

sub sql_command {
    my ($self,$str) = @_;
    my $s = $self->{_db}->prepare($str) or die $!;
    $s->execute() or die $!;
    my $table;
    push @$table, [ map { defined $_ ? $_ : "undef" } @{$s->{'NAME'}}];
    while(my $row = $s->fetch) {
          push @$table, [ map{ defined $_ ? $_ : "undef" }@$row ];
    }
    return box_format($table);;
}


sub box_format {
    my $table = shift;
    my $n_cols = scalar @{$table->[0]};

    my $tb = Text::Table->new(\'| ', '', (\' | ','')x($n_cols-1), \' |+');
    $tb->load(@$table);
    my $rule = $tb->rule(qw/- +/);
    my @rows = $tb->body();
    return $rule, shift @rows, $rule, @rows, $rule
           if @rows;
}

The output is as expected. But performance-wise, it takes about ~30sec to process on about an output file of size ~5MB.

Is there a better way to achieve that in terms of performance?

Thanks!

Mattan
  • 733
  • 7
  • 19

1 Answers1

2

There's several possible performance issues with that code. First is that in building $table you're pulling the entire data set into memory. This can consume a lot of memory. You'd be much better off formatting the data as it comes out. This means instead of using Text::Table->load you'd use Text::Table->add.

sub sql_command {
    my ($self,$sql) = @_;
    my $sth = $self->{_db}->prepare($sql) or die $!;
    $sth->execute() or die $!;

    return box_format_from_query($sth);
}

sub box_format_from_query {
    my $sth = shift;
    my $headers = [ map { defined $_ ? $_ : "undef" } @{$s->{'NAME'}}];
    my $num_cols = @$headers;

    my $table = Text::Table->new(\'| ', '', (\' | ','')x($num_cols-1), \' |+');
    while(my $row = $s->fetch) {
          $table->add(map { defined $_ ? $_ : "undef" } @$row );
    }

    my $rule = $tb->rule(qw/- +/);
    my @rows = $tb->body();
    return $rule, shift @rows, $rule, @rows, $rule
           if @rows;
}

The second performance problem is likely Text::Table itself. As the table methods work on copies of the data (that is, you don't pass in the data as references) its not likely to have been written with large data sets in mind. The only way to find out is to profile your code. Have a look at Devel::NYTProf. You should be able to figure out where your code is spending its time. If it turns out to be inside Text::Table methods, you should contact the author or consider another method of producing formatted output.

The third performance problem might be your query. You don't show your query, but its entirely possible its inefficient. Devel::NYTProf will tell you if you're spending a lot of time at $sth->execute. If so, you have another question to ask. :)

Schwern
  • 153,029
  • 25
  • 195
  • 336
  • Thanks. 1. Loading the data line by one did help a bit. Nice idea. :) 2. The query is just a simple "SELECT * from table". It is not the bottleneck. 3. Text::Table itself is indeed the problem. I will contact the author. Do you happen to know other methods of producing formatted output? Again, thanks! – Mattan Mar 12 '13 at 14:13
  • 1
    @Mattan Glad you got it worked out! Sorry, I don't do much data formatting. If you provided a sample of the format you're outputting maybe we could help, figuring out what Text::Table might produce is inscrutable to me. And, you guessed it, sounds like another question! – Schwern Mar 12 '13 at 21:57