-2

I want to able to read this CSV file into an array of arrays or hashes for manipulation. How can I go about it?

For example my file contains the following (the first line is the header):

Name,Age,Items,Available
John,29,laptop,mouse,Yes
Jane,28,desktop,keyboard,mouse,yes
Doe,56,tablet,keyboard,trackpad,touchpen,Yes

First column is name, second is Age, third is Items, But items can contain more than one thing separated by commas, and last column is Person availability.

How can I accurately read this?

Borodin
  • 126,100
  • 9
  • 70
  • 144
Kiluvya.a
  • 15
  • 4
  • 5
    Could any of the other fields besides "Items" contain commas? For example, are names like "Martin Luther King, Jr." possible? – ThisSuitIsBlackNot Jun 13 '16 at 16:16
  • 1
    That's okay. But please remember to *only ever copy and paste code and data into your question* that we can use to reproduce the problem that you're having, and *never type it in*. Programming has to be extremely precise, and a missing space, or a tab instead of a space can make all of the difference. We are used to most of the common errors, but there's no way to tell that your data is wrong if it looks correct. – Borodin Jun 13 '16 at 18:21
  • @PerlDog, As I learn, I tend to underestimate the complexity of the problem, sorry for my amateur mistakes. – Kiluvya.a Jun 13 '16 at 18:22

3 Answers3

5

Well-formed CSV quotes fields that contain a comma as part of the value. If your CSV is well-formed use the Text::CSV module:

use Text::CSV;

my $csv = Text::CSV->new();
while (my $row = $csv->getline(\*DATA)) {
    my $name      = $row->[0];
    my $age       = $row->[1];
    my @items     = split /,/, $row->[2];
    my $available = $row->[3];
    print "$name/$age/@items/$available\n";
}

__DATA__
Name,Age,Items,Available
John,29,"laptop,mouse",Yes
Jane,28,"desktop,keyboard,mouse",yes
Doe,56,"tablet,keyboard,trackpad",touchpen,Yes

Output:

Name/Age/Items/Available
John/29/laptop mouse/Yes
Jane/28/desktop keyboard mouse/yes
Doe/56/tablet keyboard trackpad touchpen/Yes

If your CSV is not well-formed you'll need to implement a custom parse based on knowledge of your data. Assuming that the Items column is the only multi-valued field you can split on a comma and then remove the fields with a known position. Whatever is left is the items.

while (my $line = <DATA>) {
    chomp $line;
    my @record    = split /,/, $line;
    my $name      = shift @record;
    my $age       = shift @record;
    my $available = pop   @record;
    my @items     = @record;

    print "$name/$age/@items/$available\n";
}

__DATA__
Name,Age,Items,Available
John,29,laptop,mouse,Yes
Jane,28,desktop,keyboard,mouse,yes
Doe,56,tablet,keyboard,trackpad,touchpen,Yes

Alternately, you could use array slicing to get the same result:

 my ($name, $age, $available, @items) = @record[0, 1, -1, 2 .. @record - 2];
Michael Carman
  • 30,628
  • 10
  • 74
  • 122
  • As matter of fact, you are right with column Items containing “item,item,item” And I think I understand your solution, so I should get the columns that are well formated then make the rest an array( which will be elements in “items”. – Kiluvya.a Jun 13 '16 at 16:32
  • 2
    @Kiluvya.a You mean there are double quotes? If so, please edit your question to show a sample of the actual data you're using. – ThisSuitIsBlackNot Jun 13 '16 at 16:36
  • Name,Age,Items,Available John,29,"laptop,mouse",Yes Jane,28,"desktop,keyboard,mouse",yes Doe,56,"tablet,keyboard,trackpad,touch pen",Yes – Kiluvya.a Jun 13 '16 at 16:40
  • 3
    @Kiluvya.a That totally changes things. It looks like you do have a well-formed CSV after all, so you should just use a proper CSV parser like [Text::CSV_XS](https://metacpan.org/pod/Text::CSV_XS). Keep in mind that if we can't see a representative sample of your real data, we may not be able to offer the most appropriate solutions. Michael's answer is an excellent solution to the problem you originally described, but apparently that's not the actual problem you have. – ThisSuitIsBlackNot Jun 13 '16 at 16:48
  • 1
    @Kiluvya.a As ThisSuitIsBlackNot said, that changes things. I've updated my answer to address data both with and without quoted fields. – Michael Carman Jun 13 '16 at 16:57
  • @ThisSuitIsBlackNot and Michael Carman, Thank you. I am still working on the solution myself. And thanks for teaching me how to ask questions properly – Kiluvya.a Jun 13 '16 at 18:20
2

Since your data is, in reality, a properly-formatted CSV file, you can use the standard tools to read and store it

Here's the data I'm now assuming that you have

Name,Age,Items,Available
John,29,"laptop,mouse",Yes
Jane,28,"desktop,keyboard,mouse",yes
Doe,56,"tablet,keyboard,trackpad,touch pen",Yes

Solution

Like my original answer, this code uses Text::CSV to parse each line of input. But instead of having to reformat it, each row may be pushed directly onto array @data

Also as before, it conforms to the standard of reading from STDIN. But this time I have used Data::Dump to reveal the in-memory data structure that has been built. If you run it on the command line you should use

$ perl unpack_csv.pl text.csv
use strict;
use warnings 'all';

use Text::CSV;

my $csv = Text::CSV->new;

my @data;

while ( <> ) {
    $csv->parse($_);
    my @row = $csv->fields;
    push @data, \@row;
}

use Data::Dump;
dd \@data;
Borodin
  • 126,100
  • 9
  • 70
  • 144
0

Update

I now realise that the OP's file may well contain properly-formatted CSV data, which makes this answer superfluous

However the question has not been changed to show the real data, so I am leaving this answer here in case the question's subject line and content entices people with a problem that this will solve


I recommend that you use an intermediate program to format your CSV file properly. Once you have a standard-format file, the resulting output can then be processed using Perl with Text::CSV, Excel, or anything similar

This program uses Text::CSV to read your input data and write the Items column enclosed in quotes if necessary

It works by using Text::CSV->parse to split each line into fields, and then reserving the first two and final fields for new fields 1, 2 and 4. Whatever is left is joined with a comma , and used for field 3. The four resulting values are passed back to Text::CSV->combine and printed

It conforms to the standard of reading from STDIN and writing to STDOUT, so if you run it on the command line you should use

$ perl reformat_csv.pl text.csv > new_text.csv
use strict;
use warnings 'all';

use Text::CSV;

my $csv = Text::CSV->new;

while ( <> ) {

    $csv->parse($_);
    my @row = $csv->fields;

    my $f1 = shift @row;
    my $f2 = shift @row;
    my $f4 = pop @row;

    my $f3 = join ',', @row;

    $csv->combine($f1, $f2, $f3, $f4);
    print $csv->string, "\n";
}

output

Name,Age,Items,Available
John,29,"laptop,mouse",Yes
Jane,28,"desktop,keyboard,mouse",yes
Doe,56,"tablet,keyboard,trackpad,touchpen",Yes
Borodin
  • 126,100
  • 9
  • 70
  • 144