0

I want to convert excel-files to csv-files with Perl. For convenience I like to use the module File::Slurp for read/write operations. I need it in a subfunction.

While printing out to the screen, the program generates the desired output, the generated csv-files unfortunately just contain one row with semicolons, field are empty.

Here is the code:

#!/usr/bin/perl

use File::Copy;
use v5.14;
use Cwd;
use File::Slurp;
use Spreadsheet::ParseExcel;


sub xls2csv {
    my $currentPath = getcwd();
    my @files       = <$currentPath/stage0/*.xls>;

    for my $sourcename (@files) {
        print "Now working on $sourcename\n";
        my $outFile = $sourcename;
        $outFile =~ s/xls/csv/g;
        print "Output CSV-File: ".$outFile."\n";
        my $source_excel = new Spreadsheet::ParseExcel;
        my $source_book  = $source_excel->Parse($sourcename)
          or die "Could not open source Excel file $sourcename: $!";

        foreach my $source_sheet_number ( 0 .. $source_book->{SheetCount} - 1 )
        {
            my $source_sheet = $source_book->{Worksheet}[$source_sheet_number];

            next unless defined $source_sheet->{MaxRow};
            next unless $source_sheet->{MinRow} <= $source_sheet->{MaxRow};
            next unless defined $source_sheet->{MaxCol};
            next unless $source_sheet->{MinCol} <= $source_sheet->{MaxCol};

            foreach my $row_index (
                $source_sheet->{MinRow} .. $source_sheet->{MaxRow} )
            {
                foreach my $col_index (
                    $source_sheet->{MinCol} .. $source_sheet->{MaxCol} )
                {
                    my $source_cell =
                      $source_sheet->{Cells}[$row_index][$col_index];
                    if ($source_cell) {

                        print $source_cell->Value, ";"; # correct output!

                        write_file( $outFile, { binmode => ':utf8' }, $source_cell->Value, ";" ); # only one row of semicolons with empty fields!
                    }
                }
                print "\n";
            }
        }

    }
}

xls2csv();

I know it has something to do with the parameter passing in the write_file function, but couldn't manage to fix it.

Has anybody an idea?

Thank you very much in advance.

royskatt
  • 1,190
  • 2
  • 15
  • 35
  • 1
    Add `use strict; use warnings;` and report the errors/warnings you get. I think `use v5.14` activates strict, but not warnings.. but use them both to be sure. – TLP Aug 31 '13 at 12:21
  • You should know that you risk overwriting your original files with the line `$outFile =~ s/xls/csv/g`. In Windows, `<*.xls>` will match something like `foo.XLS`, but your regex is case sensitive, and will not perform the substitution, so your in and out file names will be the same. Use the `/i` to ignore case. – TLP Aug 31 '13 at 12:29
  • Thanks for the hint with case sensitive. I added use strict; use warnings; The only warning I get is "Wide character in print at etl.pl line 45.". But that's because I have characters like "ö" in my files. – royskatt Aug 31 '13 at 12:41
  • In the documentation, the functions are `parse()` and `value()`, but you have them capitalized `Parse()` and `Value()`. Does that actually work? – TLP Aug 31 '13 at 13:01
  • Yes, it does work, even when having them capitalized. – royskatt Sep 01 '13 at 11:47

2 Answers2

1

write_file will overwrite the file unless the append => 1 option is given. So this:

write_file( $outFile, { binmode => ':utf8' }, $source_cell->Value, ";" ); 

Will write a new file for each new cell value. It does however not match your description of "only one row of semi-colons of empty fields", as it should only be one semi-colon, and one value.

I am doubtful towards this sentiment from you: "For convenience I like to use the module File::Slurp". While the print statement works as it should, using File::Slurp does not. So how is that convenient?

What you should do, if you still want to use write_file is to gather all the lines to print, and then print them all at once at the end of the loop. E.g.:

$line .= $source_cell->Value . ";";   # use concatenation to build the line
...
push @out, "$line\n";                 # store in array
...
write_file(...., \@out);              # print the array

Another simple option would be to use join, or to use the Text::CSV module.

TLP
  • 66,756
  • 10
  • 92
  • 149
  • As I mentioned below, the first problem was solved, though not very elegantly. Unfortunately another one popped up: When I have blank columns in the Excel file, no corresponding fields in the CSV-file are generated (semicolons missing). :-( – royskatt Sep 01 '13 at 11:54
  • Another question to your propostition above: why do you use a reference in write_file: \@out and not just @out? What difference makes that? – royskatt Sep 01 '13 at 11:58
  • @royskatt When I skimmed through the documentation I saw that it was (possibly) a minor optimization, in that you don't have to copy the data. When you pass an array, the array is expanded and the elements copied to `@_`. As for blank fields... you will have to make sure a value is printed even if the cell is blank. For example, it is probably not correct to check `if ($source_cell)`, since it might mean blank cells are skipped. You can use the defined-or operator, e.g. `my $value = $source_cell->value // ""` – TLP Sep 01 '13 at 12:29
  • Thanks for clarifing the use of the reference. I read the documentation about defined-or, looks quite handy. Actually if the function value hits an empty cell, the program stops, claiming value can't be applied on undefined. Btw, do you accept paypal to fix it? :) http://62.113.243.155/xls2csv.tar – royskatt Sep 01 '13 at 16:15
  • I suspect it would be easier for me to write the program from scratch than to fix it. I could certainly write it for you if you paid me, though not tonight, I am dead tired. – TLP Sep 01 '13 at 17:33
  • I would need test data and expected output. – TLP Sep 01 '13 at 17:36
  • Testdata is included, expected output as well. Sure, take a rest for now :). How much would you want for this? – royskatt Sep 01 '13 at 18:00
  • Slept like a log. Let me know when you are online. – TLP Sep 02 '13 at 08:33
  • Good to hear :). stackoverflow.com doesn't offer private message, do they? Couldn't find any button for that. Regarding my problem: when I'm back home there's one thing that came up in my mind, which could solve it. If not, how much would you take for the job? – royskatt Sep 02 '13 at 10:48
  • Let me know when you've seen this, because I intend to delete the comment afterwards so that my email is not public. – TLP Sep 02 '13 at 11:47
  • Did you want me to help you with this? – TLP Sep 02 '13 at 14:39
0

Well, in this particular case, File::Slurp was indeed complicating this for me. I just wanted to avoid to repeat myself, which I did in the following clumsy working solution:

#!/usr/bin/perl

use warnings;
use strict;
use File::Copy;
use v5.14;
use Cwd;
use File::Basename;
use File::Slurp;
use Tie::File;
use Spreadsheet::ParseExcel;
use open qw/:std :utf8/;

# ... other functions

sub xls2csv {
    my $currentPath = getcwd();
    my @files       = <$currentPath/stage0/*.xls>;
    my $fh;

    for my $sourcename (@files) {
        say "Now working on $sourcename";
        my $outFile = $sourcename;
        $outFile =~ s/xls/csv/gi;
        if ( -e $outFile ) {
            unlink($outFile) or die "Error: $!";
            print "Old $outFile deleted.";
        }
        my $source_excel = new Spreadsheet::ParseExcel;
        my $source_book  = $source_excel->Parse($sourcename)
          or die "Could not open source Excel file $sourcename: $!";

        foreach my $source_sheet_number ( 0 .. $source_book->{SheetCount} - 1 )
        {
            my $source_sheet = $source_book->{Worksheet}[$source_sheet_number];

            next unless defined $source_sheet->{MaxRow};
            next unless $source_sheet->{MinRow} <= $source_sheet->{MaxRow};
            next unless defined $source_sheet->{MaxCol};
            next unless $source_sheet->{MinCol} <= $source_sheet->{MaxCol};

            foreach my $row_index (
                $source_sheet->{MinRow} .. $source_sheet->{MaxRow} )
            {
                foreach my $col_index (
                    $source_sheet->{MinCol} .. $source_sheet->{MaxCol} )
                {
                    my $source_cell =
                      $source_sheet->{Cells}[$row_index][$col_index];
                    if ($source_cell) {
                        print $source_cell->Value, ";";
                        open( $fh, '>>', $outFile ) or die "Error: $!";
                        print $fh $source_cell->Value, ";";
                        close $fh;
                    }
                }
                print "\n";
                open( $fh, '>>', $outFile ) or die "Error: $!";
                print $fh "\n";
                close $fh;
            }
        }

    }
}

xls2csv();

I'm actually NOT happy with it, since I'm opening and closing the files so often (I have many files with many lines). That's not very clever in terms of performance.

Currently I still don't know how to use the split or Text:CSV in this case, in order to put everything into an array and to open, write and close each file only once.

Thank you for your answer TLP.

royskatt
  • 1,190
  • 2
  • 15
  • 35
  • You don't have to open for appending (`>>`) once for each value. Just use `>` mode once per file. – TLP Sep 01 '13 at 12:25