1

I have an 8MB CSV file. Using Spreadsheet::Read it takes 10 seconds to read:

my $book = ReadData ( 'file.csv' );
my @rows = Spreadsheet::Read::rows($book->[1]); # first sheet
foreach my $i (2 .. scalar @rows) { # ignore first header row
    my $first = $rows[$i-1][1];
    #...
}

Using Text::CSV_XS, it takes 1 second:

open my $fh, "<:encoding(utf8)", 'file.csv' or die $!;
my $csv = Text::CSV_XS->new ({ diag_verbose=>1, auto_diag=>1, binary=>1, sep_char=>";" });
$csv->getline($fh); # Ignore Header
while (my $row = $csv->getline ($fh)) { 
    my $first = $row->[1];
    #...
}
close ($fh);

Can I force Spreadsheet::Read to use Text::CSV_XS and expect similar peformance? I tried:

  1. Specifying a parser:
my $book = Spreadsheet::Read->new (
    'file.csv',
    sep => ';',
    parser => 'csv',
    );
  1. Setting the parser environment variable:
$ENV{SPREADSHEET_READ_CSV} = 'Text::CSV_XS';

Output of Spreadsheet::Read->parsers() is:

$VAR1 = {
          'ext' => 'csv',
          'def' => '',
          'mod' => 'Text::CSV',
          'min' => '1.17',
          'vsn' => '-'
        };
$VAR2 = {
          'ext' => 'csv',
          'def' => '',
          'mod' => 'Text::CSV_PP',
          'min' => '1.17',
          'vsn' => '-'
        };
$VAR3 = {
          'vsn' => '1.50',
          'min' => '0.71',
          'ext' => 'csv',
          'mod' => 'Text::CSV_XS',
          'def' => '*'
        };
$VAR4 = {
          'min' => '0.01',
          'vsn' => '0.87',
          'def' => '*',
          'mod' => 'Spreadsheet::Read',
          'ext' => 'sc'
        };
$VAR5 = {
          'vsn' => '0.65',
          'min' => '0.34',
          'ext' => 'xls',
          'mod' => 'Spreadsheet::ParseExcel',
          'def' => '*'
        };
$VAR6 = {
          'min' => '0.24',
          'vsn' => '0.27',
          'ext' => 'xlsm',
          'def' => '*',
          'mod' => 'Spreadsheet::ParseXLSX'
        };
$VAR7 = {
          'min' => '0.24',
          'vsn' => '0.27',
          'def' => '*',
          'mod' => 'Spreadsheet::ParseXLSX',
          'ext' => 'xlsx'
        };
$VAR8 = {
          'min' => '0.13',
          'vsn' => '-',
          'ext' => 'xlsx',
          'def' => '',
          'mod' => 'Spreadsheet::XLSX'
        };
$VAR9 = {
          'vsn' => undef,
          'min' => '',
          'ext' => 'zzz2',
          'mod' => 'Z20::Just::For::Testing',
          'def' => '*'
        };

also:

$ perl -MSpreadsheet::Read -E'say Spreadsheet::Read::parses( "csv" )'
Text::CSV_XS
$ perl -MText::CSV_XS -E'say Text::CSV_XS->VERSION'
1.50
h q
  • 1,168
  • 2
  • 10
  • 23
  • It should use Text::CSV_XS if version 0.71 is installed. What's the output of `perl -Mv5.14 -MSpreadsheet::Read -e'say Spreadsheet::Read::parses( "csv" )'`? – ikegami May 26 '23 at 13:20
  • @ikegami the output is: `Text::CSV_XS`. Also `perl -MText::CSV_XS -E 'say Text::CSV_XS->VERSION'` prints `1.50` – h q May 27 '23 at 11:03
  • So Text::CSV_XS *is* being used. – ikegami May 27 '23 at 16:48

1 Answers1

0

You asked if you could force Spreadsheet::Read to use Text::CSV_XS.

But you also said the output from the follow is Text::CSV_XS.

perl -Mv5.14 -MSpreadsheet::Read -e'say Spreadsheet::Read::parses( "csv" )'

This demonstrates that Text::CSV_XS is being used.

ikegami
  • 367,544
  • 15
  • 269
  • 518
  • Thank you @ikegami. Should I not expect similar speed when using `Spreadsheet::Read`? Why is it significantly slowe? [Sample CSV](https://github.com/datablist/sample-csv-files/raw/main/files/organizations/organizations-100000.csv) – h q May 27 '23 at 17:19
  • 1
    It does things to each cell to convert it to a spreadsheet. – ikegami May 27 '23 at 18:58
  • This is helpful. Would you kindly update your answer to reflect your note about performance? as this was the main motive behind my question. Thanks again. – h q May 27 '23 at 19:36