-1

I have a format file with following contents

FIELD NO.,FIELD NAME,STARTING POSITION,ENDING POSITION,LENGTH,INDICATOR
1,SEQ_NO,1,11,11,N
2,CTR_REC_ID,12,14,3,N
3,CTR_SEQ_AMT,15,23,9,A
4,CTR_CONTRACT_NO,24,46,23,N
5,CTR_CONTRACT_AMT,47,59,13,A
6,CTR_TRACK_NO,60,62,3,N

The file gives details about each field's start position, end position and length.

My script reads the format file and stores the contents into an array. Then the source file is read line by line and when the indicator in format file is A convert the EBCDIC character into suitable ASCII character.

Here's a sample source file with an index showing the character positions and the fields listed in the format file. It is not part of the data file.

         1         2         3         4         5         6         7
1234567890123456789012345678901234567890123456789012345678901234567890
[         ][ ][       ][                     ][           ][ ]
         5|CTR00002173{09C00000000001         000000000201AE00      
        22|CTR00002243A52C00000000007         000000002358JF00
        24|CTR00008456J52C00000000008         000000002465{F00

As per the format file, the CTR_SEQ_AMT value of the first line is 00002173{. { must be converted to 0 and replace leading zero with blank space and also make it a decimal value(decimal(9,2)).

The final output will look like this:

     5|CTR    217.3009C00000000001                  20.11E00
    22|CTR    224.3152C00000000007                -235.85F00
    24|CTR   -845.6552C00000000008                 246.50F00

{ --> 0, A --> 1, J --> 5 (Convert the decimal value to negative).

I have got the above output using Perl, but the performance of my script is bad.

My source file can have a million records. As per my script I have 2 loops: one to read the format file and store data into an array, and another to read the source file line by line and do the conversion. So if a format file has 10 fields to be converted, my code will loop through those million lines 10 times.

Instead of having two loops, can I have a single loop for reading the format file and when indicator of a field is A perform the following operations in all the lines at a time for a particular field:

  • Replace leading zero with blank
  • Convert EBCDIC character to ASCII character
  • Make it a decimal value.

Currently my code has following line to perform the above 3 steps.Here line has each line from source file.

$f_cnt   = $start_pos - 1;
$dec_cnt = $array_length[$cnt] - 2;
$field   = substr("$line",  $end_pos -1, 1);

if ( $field  eq '{' ) {
     print "replacing { \n";
     $x = substr($line, $f_cnt, $dec_cnt);
     $x =~ s/^(0*)/' ' x length($1)/e;
     substr($line, $f_cnt, $dec_cnt, "$x.");
     substr($line, $end_pos, 1, "0") ;
 }
Borodin
  • 126,100
  • 9
  • 70
  • 144
user1768029
  • 415
  • 8
  • 22
  • 1
    I find this description very hard to understand. What are the 5,22,24 at the start of the source file? Where is the indicator in the source file - it seems to be in the format file? – Mark Setchell Apr 08 '17 at 13:48
  • Sorry for the confusion.Added pipes above source file data to know the position of each field.Pipes are not part of source file data.Now,if we check format file, first field is SEQ_NO and it has 11 characters starting from 1st to 11th position in source file.INDICATOR is in format file.We need to read format file to know each field details in source file.Indicator of CTR_CONTRACT_AMT field in format file is A which says conversion is required.I hope this description clear your doubt @Mark Setchell – user1768029 Apr 08 '17 at 14:46
  • 1
    It still doesn't make sense! If you have 10 fields to convert, you read the format file just **once** at the start, then as you read each line of the data file you make all 10 conversions and write the corresponding output line. You never have to store more than one line in memory as you never need any other line to process the current line so it cannot be a memory problem. – Mark Setchell Apr 08 '17 at 15:10
  • So what you're trying to do at present is to convert the EBCDIC to ASCII and create a new file. Is that right? – Borodin Apr 08 '17 at 15:32
  • @Borodin. Yes correct!! Now format file is read at once and store it in an array. Then source file will be read line by line.each field in a line is checked to find if it's eligible for conversion and do conversion if it's eligible.My concern here is , instead of looping line by line , is it possible in perl to do conversion in all the lines at a time like sed in unix. – user1768029 Apr 08 '17 at 17:00
  • I don't understand how `{` is in any sense the EBCDIC version of zero. In fact none of it looks like EBCDIC to me. What is supposed to happen to the `A` and `J` which also appear as the last character of the fields to be converted? How did `-845.65` turn out to be negative? – Borodin Apr 08 '17 at 17:00
  • You have a bug in ebcdic conversion: 00008456J -> -845.61. see http://www.3480-3590-data-conversion.com/article-signed-fields.html – BOC Apr 10 '17 at 12:28

1 Answers1

0

After some optimisation, I have:

my %ebcdic_fields;

while (my $line = <DATA>) {
    my (undef, undef, $start, undef, $length, $indicator) = split /,/, $line;
    next if $indicator !~ m/^A/;
    $ebcdic_fields{$start-1} = $length - 2;
}

while (my $line = <>) {
    while (my ($start, $length) = each %ebcdic_fields) {
        my $fpos = $start + $length + 1;
        my $before = substr ($line, $start, $length);
        my $format = substr ($line, $fpos, 1);
        my $trimed_before = $before + 0; # keep at least one 0 before the dot
        if ($format ge 'J' and $format ne '{') {
            substr ($line, $fpos, 1) =~ tr/}JKLMNOPQR/0123456789/;
            substr ($line, $start, $length) = ' ' x ($length - length($trimed_before) - 2) . '-' . $trimed_before . '.';
        } else {
            substr ($line, $fpos, 1) =~ tr/{ABCDEFGHI/0123456789/;
            substr ($line, $start, $length) = ' ' x ($length - length($trimed_before) - 1) . $trimed_before . '.';
        }
    }
    print $line;
}
__DATA__
1,SEQ_NO,1,11,11,N
2,CTR_REC_ID,12,14,3,N
3,CTR_SEQ_AMT,15,23,9,A
4,CTR_CONTRACT_NO,24,46,23,N
5,CTR_CONTRACT_AMT,47,59,13,A
6,CTR_TRACK_NO,60,62,3,N

Sadly, the fastest I have achieved is with this:

my %ebcdic_fields;

while (my $line = <DATA>) {
    my (undef, undef, $start, undef, $length, $indicator) = split /,/, $line;
    next if $indicator !~ m/^A/;
    $ebcdic_fields{$start-1} = $length - 2;
}

while (my $line = <>) {
    while (my ($start, $length) = each %ebcdic_fields) {
        my $format = substr ($line, $start + $length + 1, 1);
        my $trimed_before = (substr ($line, $start, $length) + 0); # keep at least one 0 before the dot
        if  ($format eq '{') {
            substr ($line, $start, $length+2) = ' ' x ($length - length($trimed_before) - 1) . $trimed_before . '.' . substr ($line, $start + $length, 1) . '0';
        } elsif  ($format eq 'A') {
            substr ($line, $start, $length+2) = ' ' x ($length - length($trimed_before) - 1) . $trimed_before . '.' . substr ($line, $start + $length, 1) . '1';
        } elsif  ($format eq 'B') {
            substr ($line, $start, $length+2) = ' ' x ($length - length($trimed_before) - 1) . $trimed_before . '.' . substr ($line, $start + $length, 1) . '2';
        } elsif  ($format eq 'C') {
            substr ($line, $start, $length+2) = ' ' x ($length - length($trimed_before) - 1) . $trimed_before . '.' . substr ($line, $start + $length, 1) . '3';
        } elsif  ($format eq 'D') {
            substr ($line, $start, $length+2) = ' ' x ($length - length($trimed_before) - 1) . $trimed_before . '.' . substr ($line, $start + $length, 1) . '4';
        } elsif  ($format eq 'E') {
            substr ($line, $start, $length+2) = ' ' x ($length - length($trimed_before) - 1) . $trimed_before . '.' . substr ($line, $start + $length, 1) . '5';
        } elsif  ($format eq 'F') {
            substr ($line, $start, $length+2) = ' ' x ($length - length($trimed_before) - 1) . $trimed_before . '.' . substr ($line, $start + $length, 1) . '6';
        } elsif  ($format eq 'G') {
            substr ($line, $start, $length+2) = ' ' x ($length - length($trimed_before) - 1) . $trimed_before . '.' . substr ($line, $start + $length, 1) . '7';
        } elsif  ($format eq 'H') {
            substr ($line, $start, $length+2) = ' ' x ($length - length($trimed_before) - 1) . $trimed_before . '.' . substr ($line, $start + $length, 1) . '8';
        } elsif  ($format eq 'I') {
            substr ($line, $start, $length+2) = ' ' x ($length - length($trimed_before) - 1) . $trimed_before . '.' . substr ($line, $start + $length, 1) . '9';
        } elsif ($format eq '}') {
            substr ($line, $start, $length+2) = ' ' x ($length - length($trimed_before) - 2) . '-' . $trimed_before . '.' . substr ($line, $start + $length, 1) . '0';
        } elsif ($format eq 'J') {
            substr ($line, $start, $length+2) = ' ' x ($length - length($trimed_before) - 2) . '-' . $trimed_before . '.' . substr ($line, $start + $length, 1) . '1';
        } elsif ($format eq 'K') {
            substr ($line, $start, $length+2) = ' ' x ($length - length($trimed_before) - 2) . '-' . $trimed_before . '.' . substr ($line, $start + $length, 1) . '2';
        } elsif ($format eq 'L') {
            substr ($line, $start, $length+2) = ' ' x ($length - length($trimed_before) - 2) . '-' . $trimed_before . '.' . substr ($line, $start + $length, 1) . '3';
        } elsif ($format eq 'M') {
            substr ($line, $start, $length+2) = ' ' x ($length - length($trimed_before) - 2) . '-' . $trimed_before . '.' . substr ($line, $start + $length, 1) . '4';
        } elsif ($format eq 'N') {
            substr ($line, $start, $length+2) = ' ' x ($length - length($trimed_before) - 2) . '-' . $trimed_before . '.' . substr ($line, $start + $length, 1) . '5';
        } elsif ($format eq 'O') {
            substr ($line, $start, $length+2) = ' ' x ($length - length($trimed_before) - 2) . '-' . $trimed_before . '.' . substr ($line, $start + $length, 1) . '6';
        } elsif ($format eq 'P') {
            substr ($line, $start, $length+2) = ' ' x ($length - length($trimed_before) - 2) . '-' . $trimed_before . '.' . substr ($line, $start + $length, 1) . '7';
        } elsif ($format eq 'Q') {
            substr ($line, $start, $length+2) = ' ' x ($length - length($trimed_before) - 2) . '-' . $trimed_before . '.' . substr ($line, $start + $length, 1) . '8';
        } elsif ($format eq 'R') {
            substr ($line, $start, $length+2) = ' ' x ($length - length($trimed_before) - 2) . '-' . $trimed_before . '.' . substr ($line, $start + $length, 1) . '9';
        }
    }
    print $line;
}
BOC
  • 1,109
  • 10
  • 20