-1

I'm using Strawberry Perl 5.18.2.2 on Windows 8.1 64bit with Text::CSV installed, and I'm trying to parse the following CSV file from PayPal.

Date, Time, Time Zone, Name, Type, Status, Gross, Fee, Net, From Email Address, To Email Address, Transaction ID, Counterparty Status, Address Status, Item Title, Item ID, Shipping and Handling Amount, Insurance Amount, Sales Tax, Option 1 Name, Option 1 Value, Option 2 Name, Option 2 Value, Auction Site, Buyer ID, Item URL, Closing Date, Escrow Id, Invoice Id, Reference Txn ID, Invoice Number, Custom Number, Receipt ID, Balance, Address Line 1, Address Line 2/District/Neighborhood, Town/City, State/Province/Region/County/Territory/Prefecture/Republic, Zip/Postal Code, Country, Contact Phone Number, 
"5/5/2014","17:44:45","PDT","Jack Payer","Payment Received","Completed","1.00","0.00","1.00","jack@yahoo.com","payer@gmail.com","8HT05934290026J","Verified","","","","","","","","","","","","","","","","","","","","","1.67","","","","","","","",

open, read and parse the file using:

open(READ, $sourcefile)
open($fh,"+>:encoding(utf8)","$base.m.csv")

undef $/;
$_ = <READ>;

# convert Unix line ending to dos
$_ =~ s/\r?\n|\r/\r\n/g;
print $fh $_;
close READ;

$/ = "\r\n";
seek($fh, 0, 0);

my $csv = Text::CSV->new({ allow_whitespace => 1, binary => 1 }); # should set binary attribute.
my $row=$csv->getline ($fh);
my @fields = @$row;
$csv->column_names (@fields);
$row = $csv->getline_hr($fh)

Here is the problem, starting with the Gross column it's reading it incorrectly which then messes up all the subsequent column. $row->{'Gross'} reads 1.00,^@.00 instead of 1.00.

The ^@ symbol is a special NULL character that I see when I open the output file with gvim. This is where the issue starts.

Subsequently, $gross=$row->{'Fee'} reads 1.00 instead of 0.00, and $gross=$row->{'Net'} reads jack@yahoo.com instead of 1.00

Any idea why it's getting messed up?

EDIT: FULL CODE AS REQUESTED here: https://www.dropbox.com/s/p064pxitmw3jwmj/csv2qif.pl

EDIT: If i change

"1.00","0.00","1.00"

in the CSV file to

"1.00","2.00","3.00"

it works fine. Infact anything starting with "0.xx" it ends up parsing it as a NULL character. I don't get the NULL character. I'm lost, is something wrong with the CSV parser?

rboy
  • 2,018
  • 1
  • 23
  • 35
  • Can you show your complete program? What you have now doesn't show how you define several variables. – Hunter McMillen Jun 12 '14 at 16:18
  • 2
    @rboy: The `^@` is a `NUL` character, and it isn't in the sample data that you show. The data processes just fine for me. Also, Perl will change the line endings itself. Doing it in your program as well means it is done twice and so the data gets messed up. – Borodin Jun 12 '14 at 16:47
  • @HunterMcMillen I've added a link to the full code. – rboy Jun 12 '14 at 16:55
  • @Borodin are you suggesting I remove the line $_ =~ s/\r?\n|\r/\r\n/g; – rboy Jun 12 '14 at 16:55
  • @rboy: That's right. If you change all the newlines to `\r\n` then Perl will do it again and you'll get `\r\r\n` – Borodin Jun 12 '14 at 17:01
  • darn that broke something else in the script down about some required fields. Sorry I'm not perl expert, I'm trying to customize a script written by someone else – rboy Jun 12 '14 at 17:04
  • but that doesn't explain why it would be reading a `NUL` in the middle of the row – rboy Jun 12 '14 at 17:05
  • You should take out all mention of `\r\n`. Unix input file should be opened as `open my $fh, '<:unix', $filename or die $!` – Borodin Jun 12 '14 at 17:12
  • Tried that, opened in Unix mode (reading and writing), commented the line and also changed the subsequent line from '\r\n' to '\n', still getting the `NUL` character – rboy Jun 12 '14 at 17:24
  • Have you verified that there isn't a null character in the source file, using something like `od`? – ThisSuitIsBlackNot Jun 12 '14 at 17:32
  • yes I checked with gvim there are no null characters in the source file (but I see it in the output file) – rboy Jun 12 '14 at 17:41
  • The code snippet you posted doesn't match the output you describe. `$csv->getline` returns an array reference, not a hash reference like you access with `$row->{'Gross'}`. The full code you linked to is long and convoluted; I wouldn't expect SO users to read through all of that to find your bug (just judging by the use of `no strict "refs";` and the lack of `use warnings;`, I'm sure there are other bugs as well). Post a short, *reproducible* example that demonstrates your problem. – ThisSuitIsBlackNot Jun 12 '14 at 17:59
  • @ThisSuitIsBlackNot sorry about that, again I'm a newbie here but thanks for pointing it out, I've fixed it and included all the code required to get to the hash reference – rboy Jun 12 '14 at 18:03
  • See my latest update, if I change "1.00","0.00","1.00" to "1.00","2.00","3.00" it works fine – rboy Jun 12 '14 at 18:22
  • @ThisSuitIsBlackNot come on, lets see your prowess, apart from marking down a question and negativity how about some productive inputs – rboy Jun 12 '14 at 19:00
  • 1
    Again, you haven't posted a *reproducible* example of your problem. The code snippet you posted doesn't even compile. Write a small test program that others can run that demonstrates the issue. Having said that, I found this in the [Text::CSV changelog](https://metacpan.org/changes/distribution/Text-CSV) under v1.12 (May 2009): *`getline()` didn't handle a line having null (ex. `"0`)*. Are you perhaps using an old version? Check with `perl -MText::CSV -e 'print $Text::CSV::VERSION'` – ThisSuitIsBlackNot Jun 12 '14 at 19:08
  • 2
    There is a similar comment in the [changelog](https://metacpan.org/changes/distribution/Text-CSV) under v1.19 (October 2010): *`getline` didn't work correctly with 0 or null containing lines*. – ThisSuitIsBlackNot Jun 12 '14 at 19:11
  • So your code doesn't work either: perl -MText::CSV -e 'print $Text::CSV::VERSION' gives me Can't find string terminator "'" anywhere before EOF at -e line 1. – rboy Jun 12 '14 at 19:27
  • but your comment on the `getline` not working is great! I upgraded to 5.20.0.1 32bit and reinstalled Text::CSV and now it works. please post your answer and I'll be happy to accept it – rboy Jun 12 '14 at 19:28
  • 1
    @rboy - You're on Windows, right? Try it with `"` instead of `'`: `perl -MText::CSV -e "print $Text::CSV::VERSION"` – Jim Davis Jun 12 '14 at 19:28
  • @JimDavis thanks that works and the answer I got back is 1.32 – rboy Jun 12 '14 at 19:30

1 Answers1

0

Thanks to @ThisSuitIsBlackNot the issue lay with Text::CSV. Quoting

I found this in the Text::CSV changelog under v1.12 (May 2009): getline() didn't handle a line having null (ex. "0). Are you perhaps using an old version? Check with perl -MText::CSV -e 'print $Text::CSV::VERSION'

The issue was a combination of Strawberry Perl and Text::CSV. I upgraded to 32 Bit 5.20.0.1 Portable edition and reinstalled Text::CSV version 1.32 (latest at this point) and it started working fine.

rboy
  • 2,018
  • 1
  • 23
  • 35