Extracting specific multiple line of records that is pipe delimited in perl

Question

I have a file that looks like

NAME|JOHN|TOKYO|JPN
AGE|32|M
INFO|SINGLE|PROFESSIONAL|IT
NAME|MARK|MANILA|PH
AGE|37|M
INFO|MARRIED|PROFESSIONAL|BPO
NAME|SAMANTHA|SYDNEY|AUS
AGE|37|F
INFO|MARRIED|PROFESSIONAL|OFFSHORE
NAME|LUKE|TOKYO|JPN
AGE|27|M
INFO|SINGLE|PROFESSIONAL|IT

I want to separate the records by country. I have stored each line into array variable @fields

my @fields = split(/\|/, $_ );

making $fields[3] as my basis for sorting it. I wanted it to separate into 2 output text files

OUTPUT TEXT FILE 1:

NAME|JOHN|TOKYO|JPN
AGE|32|M
INFO|SINGLE|PROFESSIONAL|IT
NAME|LUKE|TOKYO|JPN
AGE|27|M
INFO|SINGLE|PROFESSIONAL|IT

OUTPUT TEXT FILE 2

NAME|MARK|MANILA|PH
AGE|37|M
INFO|MARRIED|PROFESSIONAL|BPO
NAME|SAMANTHA|SYDNEY|AUS
AGE|37|F
INFO|MARRIED|PROFESSIONAL|OFFSHORE

Putting all that is from JPN to output text 1 & non-JPN country to output text file 2

here's the code that what trying to work out

use strict;
use warnings;
use Data::Dumper;
use Carp qw(croak);

my @fields;
my $tmp_var;
my $count;
;
my ($line, $i);

my $filename = 'data.txt';
open(my $input_fh, '<', $filename ) or croak "Can't open $filename: $!";


open(OUTPUTA, ">", 'JPN.txt') or die "wsl_reformat.pl: could not open $ARGV[0]";
open(OUTPUTB, ">", 'Non-JPN.txt') or die "wsl_reformat.pl: could not open $ARGV[0]";

my $fh;
while (<$input_fh>) {

    chomp;
   my @fields = split /\|/;


   if ($fields[0] eq 'NAME') {
    for ($i=1; $i < @fields; $i++) {
        if ($fields[3] eq 'JPN') {
           $fh = $_;
            print OUTPUTA $fh;
        }
        else {
           $fh = $_;
            print OUTPUTB $fh;
        }
    }

}   
}

close(OUTPUTA);
close(OUTPUTB)

Still has no luck on it :(

Sorry, I still needed 15 reps before I'll be able to vote. – Soncire Mar 21 '13 at 00:06 — Soncire, Mar 21 '13 at 00:06

ikegami · Answer 1 · 2013-03-20T04:05:27.007

1

You didn't say what you needed help with, so I'm assuming it's coming up with an algorithm. Here's a good one:

Open the file to read.
Open the file for the JPN entries.
Open the file for the non-JPN entries.
While not eof,
1. Read a line.
2. Parse the line.
3. If it's the first line of a record,
  1. If the person's country is JPN,
    1. Set current file handle to the file handle for JPN entries.
  2. Else,
    1. Set current file handle to the file handle for non-JPN entries.
4. Print the line to the current file handle.

my $jpn_qfn   = '...';
my $other_qfn = '...';

open(my $jpn_fh,   '>', $jpn_qfn)
   or die("Can't create $jpn_qfn: $!\n");
open(my $other_fh, '>', $other_qfn)
   or die("Can't create $other_qfn: $!\n");

my $fh;
while (<>) {
   chomp;
   my @fields = split /\|/;
   if ($fields[0] eq 'NAME') {
      $fh = $fields[3] eq 'JPN' ? $jpn_fh : $other_fh;
   }

   say $fh $_;
}

edited Mar 20 '13 at 04:05

answered Mar 20 '13 at 01:41

ikegami

367,544
15
269
518

since I'm new in perl can you show to me how will I extract each 3 lines – Soncire Mar 20 '13 at 01:51
you don't have to; changing what file you are writing to on the first line of a record (steps 4.3.1.1 and 4.3.2.1) automatically make the next two lines go to the right place – ysth Mar 20 '13 at 02:16
@Soncire, Where do you see "extract 3 lines" anywhere in what I posted? – ikegami Mar 20 '13 at 03:55
Since `<$fh>` reads one line, `<$fh>` three times would read three lines. – ikegami Mar 20 '13 at 03:56

score 1 · Accepted Answer · edited Mar 20 '13 at 04:01

1

Here is the way I think ikegami was saying, but I've never tried this before (although it gave the correct results).

#!/usr/bin/perl
use strict;
use warnings;

open my $jpn_fh, ">", 'o33.txt' or die $!;
open my $other_fh, ">", 'o44.txt' or die $!;

my $fh;
while (<DATA>) {
    if (/^NAME/) {
        if (/JPN$/) {
            $fh = $jpn_fh;  
        }
        else {
            $fh = $other_fh;
        }
    }
    print $fh $_;
}   

close $jpn_fh or die $!;
close $other_fh or die $!;

__DATA__
NAME|JOHN|TOKYO|JPN
AGE|32|M
INFO|SINGLE|PROFESSIONAL|IT
NAME|MARK|MANILA|PH
AGE|37|M
INFO|MARRIED|PROFESSIONAL|BPO
NAME|SAMANTHA|SYDNEY|AUS
AGE|37|F
INFO|MARRIED|PROFESSIONAL|OFFSHORE
NAME|LUKE|TOKYO|JPN
AGE|27|M
INFO|SINGLE|PROFESSIONAL|IT

edited Mar 20 '13 at 04:01

ikegami

367,544
15
269
518

answered Mar 20 '13 at 02:17

Chris Charley

6,403
2
24
26

yep that solved my problem chris, can you please write a comment on each line so I can I understand your code thank you very much – Soncire Mar 20 '13 at 02:31
If the line you've read begins with NAME, (/^NAME/), then if the same line ends in JPN, (/JPN$/), set the filehandle to $jpn, otherwise set it to $other. Then the print below will direct it to the correct file. – Chris Charley Mar 20 '13 at 02:41
thanks chris I have a subroutine that removes spaces & other stuff sub _trim { my $word = shift; if ( $word ) { $word =~ s/\A\s+|\s+\z//g; $word =~ s/\s+/ /g; $word =~ s/\|\s*/\|/g; $word =~ s/\s*\|/\|/g; $word =~ s/\$\s+/\$/g; $word =~ s/^\s+//; $word =~ s/"//g; } return $word; } how will I embed it to your code? – Soncire Mar 20 '13 at 02:57
and 1 more what if the line doesn't end on JPN? what if the line looks like this NAME|JOHN|JPN|TOKYO – Soncire Mar 20 '13 at 03:00

score 0 · Answer 3 · answered Mar 20 '13 at 04:14

#!/usr/bin/env perl

use 5.012;
use autodie;
use strict;
use warnings;

# store per country output filehandles
my %output;

# since this is just an example, read from __DATA__ section

while (my $line = <DATA>) {
    # split the fields
    my @cells = split /[|]/, $line;

    # if first field is NAME, this is a new record
    if ($cells[0] eq 'NAME') {
        # get the country code, strip trailing whitespace
        (my $country = $cells[3]) =~ s/\s+\z//;

        # if we haven't created and output file for this
        # country, yet, do so
        unless (defined $output{$country}) {
            open my $fh, '>', "$country.out";
            $output{$country} = $fh;
        }
        my $out = $output{$country};

        # output this and the next two lines to
        # country specific output file
        print $out $line, scalar <DATA>, scalar <DATA>;
    }
}

close $_ for values %output;

__DATA__
NAME|JOHN|TOKYO|JPN
AGE|32|M
INFO|SINGLE|PROFESSIONAL|IT
NAME|MARK|MANILA|PH
AGE|37|M
INFO|MARRIED|PROFESSIONAL|BPO
NAME|SAMANTHA|SYDNEY|AUS
AGE|37|F
INFO|MARRIED|PROFESSIONAL|OFFSHORE
NAME|LUKE|TOKYO|JPN
AGE|27|M
INFO|SINGLE|PROFESSIONAL|IT

score 0 · Answer 4 · answered Mar 22 '13 at 06:26

Thanks for your Help heaps I was able to solved this problem in perl, many thanks

#!/usr/local/bin/perl

use strict;
use warnings;
use Data::Dumper;
use Carp qw(croak);

my @fields;
my $tmp_var;
my ($rec_type, $country);

my $filename = 'data.txt';


open (my $input_fh, '<', $filename ) or croak "Can't open $filename: $!";


open  my $OUTPUTA, ">", 'o33.txt' or die $!;
open  my $OUTPUTB, ">", 'o44.txt' or die $!;

my $Combline;
while (<$input_fh>) {

    $_ = _trim($_); 
    @fields = split (/\|/, $_); 
    $rec_type = $fields[0];
    $country = $fields[3];

        if ($rec_type eq 'NAME') {          
            if ($country eq 'JPN') {                            
                *Combline = $OUTPUTA;
            }           
            else {                              
                *Combline = $OUTPUTB;
            }
        }       
   print  Combline;
}   

close $OUTPUTA or die $!;
close $OUTPUTB or die $!;

sub _trim {
    my $word = shift;
    if ( $word ) {      
        $word =~ s/\s*\|/\|/g;      #remove trailing spaces
        $word =~ s/"//g;        #remove double quotes
    }
    return $word;
}

Extracting specific multiple line of records that is pipe delimited in perl

4 Answers4