Can I use Text::CSV_XS to parse a csv-format string without writing it to disk?

Question

I am getting a "csv file" from a vendor (using their API), but what they do is just spew the whole thing into their response. It wouldn't be a significant problem except that, of course, some of those pesky humans entered the data and put in "features" like line breaks. What I am doing now is creating a file for the raw data and then reopening it to read the data:

open RAW, ">", "$rawfile" or die "ERROR: Could not open $rawfile for write: $! \n";
print RAW $response->content;
close RAW;

my $csv = Text::CSV_XS->new({ binary=>1,always_quote=>1,eol=>$/ });
open my $fh, "<", "$rawfile" or die "ERROR: Could not open $rawfile for read: $! \n";

while ( $line = $csv->getline ($fh) ) { ...

Somehow this seems ... inelegant. It seems that I ought to be able to just read the data from the $response->content (multiline string) as if it were a file. But I'm drawing a total blank on how do this. A pointer would be greatly appreciated. Thanks, Paul

score 7 · Accepted Answer · answered Mar 23 '20 at 17:51

7

You could use a string filehandle:

my $data = $response->content;
open my $fh, "<", \$data or croak "unable to open string filehandle : $!";
my $csv = Text::CSV_XS->new({ binary=>1,always_quote=>1,eol=>$/ });
while ( $line = $csv->getline ($fh) ) { ... }

answered Mar 23 '20 at 17:51

GMB

216,147
25
84
135

4

This is one of my favorite tricks in Perl, and I write quite a bit about it in [Effective Perl Programming](https://www.effectiveperlprogramming). Treating many things as a filehandle means you have an easier and familiar interface. It goes the other way too; you can write to a filehandle but have it show up in a string. – brian d foy Mar 23 '20 at 19:22
3

Yes, nice, I use that too -- one just shouldn't forget that it isn't a proper filehandle, so to not run into trouble; see [this post](https://stackoverflow.com/q/47045197/4653379) for example. – zdim Mar 24 '20 at 06:29
1

Well, Thank you! That was exactly what I was looking for but not quite getting. I can no longer remember exactly what combinations I had tried, but I was evidently close but not getting the syntax quite right. – Paul R N Mar 24 '20 at 13:58

zdim · Answer 2 · 2020-03-24T20:04:50.260

Yes, you can use Text::CSV_XS on a string, via its functional interface

use warnings;
use strict;
use feature 'say';

use Text::CSV_XS qw(csv);  # must use _XS version

my $csv = qq(a,line\nand,another);

my $aoa = csv(in => \$csv) 
    or die Text::CSV->error_diag; 

say "@$_" for @aoa;

Note that this indeed needs Text::CSV_XS (normally Text::CSV works but not with this).

I don't know why this isn't available in the OO interface (or perhaps is but is not documented).

While the above parses the string directly as asked, one can also lessen the "inelegant" aspect in your example by writing content directly to a file as it's acquired, what most libraries support like with :content_file option in LWP::UserAgent::get method.

Let me also note that most of the time you want the library to decode content, so for LWP::UA to use decoded_content (see HTTP::Response).

brian d foy · Answer 3 · 2020-03-24T08:59:24.420

I cooked up this example with Mojo::UserAgent. For the CSV input I used various data sets from the NYC Open Data. This is also going to appear in the next update for Mojo Web Clients.

I build the request without making the request right away, and that gives me the transaction object, $tx. I can then replace the read event so I can immediately send the lines into Text::CSV_XS:

#!perl

use v5.10;
use Mojo::UserAgent;

my $ua = Mojo::UserAgent->new;

my $url = ...;
my $tx = $ua->build_tx( GET => $url );

$tx->res->content->unsubscribe('read')->on(read => sub {
    state $csv = do {
        require Text::CSV_XS;
        Text::CSV_XS->new;
        };
    state $buffer;
    state $reader = do {
        open my $r, '<:encoding(UTF-8)', \$buffer;
        $r;
        };

    my ($content, $bytes) = @_;
    $buffer .= $bytes;
    while (my $row = $csv->getline($reader) ) {
        say join ':', $row->@[2,4];
        }
    });

$tx = $ua->start($tx);

That's not as nice as I'd like it to be because all the data still show up in the buffer. This is slightly more appealing, but it's fragile in the ways I note in the comments. I'm too lazy at the moment to make it any better because that gets hairy very quickly as you figure out when you have enough data to process a record. My particular code isn't as important as the idea that you can do whatever you like as the transactor reads data and passes it into the content handler:

use v5.10;
use strict;
use warnings;
use feature qw(signatures);
no warnings qw(experimental::signatures);

use Mojo::UserAgent;

my $ua = Mojo::UserAgent->new;

my $url = ...;
my $tx = $ua->build_tx( GET => $url );

$tx->res->content
    ->unsubscribe('read')
    ->on( read => process_bytes_factory() );

$tx = $ua->start($tx);

sub process_bytes_factory {
    return sub ( $content, $bytes ) {
        state $csv = do {
            require Text::CSV_XS;
            Text::CSV_XS->new( { decode_utf8 => 1 } );
            };
        state $buffer = '';
        state $line_no = 0;

        $buffer .= $bytes;
        # fragile if the entire content does not end in a
        # newline (or whatever the line ending is)
        my $last_line_incomplete = $buffer !~ /\n\z/;

        # will not work if the format allows embedded newlines
        my @lines = split /\n/, $buffer;
        $buffer = pop @lines if $last_line_incomplete;

        foreach my $line ( @lines ) {
            my $status = $csv->parse($line);
            my @row = $csv->fields;
            say join ':', $line_no++, @row[2,4];
            }
        };
    }

Can I use Text::CSV_XS to parse a csv-format string without writing it to disk?

3 Answers3