Bash read & parse file - loop performance

Question

I'm trying to read a file, and parse it in bash. I need to do a dd to convert from EBCDIC to ASCII, then loop and read X bytes, piping each X bytes as a row in a new file:

#!/bin/bash

# $1 = input file in EBCDIC
# $2 = row length
# $3 = output file

# convert to ASCII and replace NUL (^@) with ' '
dd conv=ascii if=$1 | sed 's/\x0/ /g' > $3.tmp
file=$(cat "$3.tmp")
sIndex=1
fIndex=$2

# remove file
rm $3
echo "filesize: ${#file}";   

# loop, retrieving each fixed-size record and appending to a file
while true; do
    # append record to file
    echo "${file:sIndex:fIndex}" >> $3;

    # break at end of file
    if [ $fIndex -ge ${#file} ] 
    then    
        break;
    fi

    # increment index
    sIndex=$((sIndex+fIndex));
done

# remove tmp
rm $3.tmp

Any way to make this whole process faster?

Don't use a temporary file. Just stream the conversion directly to the while loop and use `read` to read fixed size chunks. Though that's likely also going to be slow (the question is just where all the time is being taken up currently). Do you know what the slow part of your script is currently? — Etan Reisner, Apr 15 '15 at 02:12
Just using the temp file for now so I can see what the `dd` returns. Regarding `read`, the file is not exactly line-based, as in, the data is just literally as-is in the file (no delimiters, etc.). Not sure which part is slow, probably due the the file having many records to process. — Travis Liew, Apr 15 '15 at 03:47
`read` can read by character count as well as by line and getting profiling information should always be the first step when optimizing something. — Etan Reisner, Apr 15 '15 at 04:33

score 1 · Answer 1 · answered Apr 15 '15 at 04:15

Answering my own question. The answer is very simple with the use of fold!

# $1 = ASCII input file
# $2 = file record length (i.e. 100)
# $3 = output file (non-delimited, row-separated file)

# dd : convert from EBCDIC to ASCII
# sed : replace NUL (^@) with space ' '
# fold : wrap input to specified width (record length)

dd conv=ascii if=$1 | sed 's/\x0/ /g' | fold -$2 > $3

Bash read & parse file - loop performance

1 Answers1