BASH: Using cut on space delimited file: Treating two spaces as one

Question

I need to convert file full of lines like this:

# 2007  4 29 10  1 17.98 blah  other   stuff

into lines formatted like this

2007.04.29.10.01.17

The original line is space delimited, and when a one's place digit number appears (such as 4) it gets listed as ' 4'. When I convert it, I need to be able to change it to '04'. Thus there are spaces that delimit the file, AND spaces that are placeholders for leading zeros.

I need to write a shell script to make that conversion. I tried using the cut command because each character stays in the same exact place, so the 7th char is always a delimiting space and the 8th char is always the ten's digit, or a space that should be a leading zero. However I soon discovered that it treats two spaces as one, which totally throws off the count (Since sometimes I have ' 4' and sometimes I will have '14'.

So: I need a way to read and convert this file, either using cut, or some other method (awk?) that will allow me to do this. Either a way to modify my current code (below) or another approach that would work a lot better would be much appreciated.

Just for reference, my present code is below:

while read LINE
do
    #IF line starts with '#', then
    if [[ $LINE == "#"* ]]; then

       #123456789012345678901
        # 2008 12 26 11 26 20.36
        # 2007  5 10  1  8 10.52

        #GET 4 digit year
        LINEyear=$(echo $LINE | cut -c3-6)

        #GET 2 digit month
        if [ $(echo $LINE | cut -c8-8) == " " ]; then
            LINEmonth=0$(echo $LINE | cut -c8-9)                
        else
            LINEmonth=$(echo $LINE | cut -c8-9)
        fi

        #GET 2 digit day
        if [ $(echo $LINE | cut -c11-11) == " " ]; then
            LINEday=0$(echo $LINE | cut -c11-12)
        else
            LINEday=$(echo $LINE | cut -c11-12)
        fi

        #GET hour, min, sec, (Removed to save space)

        LINEnew=$LINEyear.$LINEmonth.$LINEday.$LINEhour.$LINEmin.$LINEsec
        echo $LINEnew

    fi
done

Most times you find yourself writing a loop in shell, you are using the wrong tool. There are exceptions of course, but they involve process or file manipulation (create/destroy) not text manipulation. — Ed Morton, May 02 '13 at 12:24

score 2 · Accepted Answer · answered May 01 '13 at 23:56

2

You can solve this in just one line of awk:

% awk '/^#/ {printf "%04d.%02d.%02d.%02d.%02d.%02d\n", $2, $3, $4, $5, $6, $7}' ~/stuff

Yields:

2007.04.29.10.01.17

answered May 01 '13 at 23:56

johnsyweb

136,902
23
188
247

Okay. Well now I am getting the correct formatting (Thanks), however I am no longer storing the string as a variable. Which I need to do because I need to compare it to another set file of string later on. I tried LineNew=awk '/^#/ {printf.... etc, but that did not work – Brian C May 02 '13 at 01:11
I see. In your question, you're just `echo`ing the matched lines, and that is what I was emulating / fixing. [tag:awk] can do the comparisons, too, I am sure, but that seems to be a different question and should be asked as such. – johnsyweb May 02 '13 at 02:43
I just printed the output to a file and then I will make a separate program to compare this to the other file I have, not as neat, but it should work just fine. – Brian C May 02 '13 at 05:01

Kevin Lee · Answer 2 · 2013-05-02T02:39:16.540

1

echo "# 2007  4 29 10  1 17.98 blah  other   stuff" | tr -s " "

I use tr in conjunction with cut because of the variability in space delimiting, the tr -s ' ' trims the multiple spaces.

Then, use cut to ignore both the # (unless you want that as a field), and then a 2nd time to pick, say the fourth field:

echo "# 2007  4 29 10  1 17.98 blah  other   stuff" | tr -s " " | cut -d'#' -f2 | cut -d' ' -f4

edited May 02 '13 at 02:39

answered May 02 '13 at 02:33

Kevin Lee

718
6
19

BASH: Using cut on space delimited file: Treating two spaces as one

2 Answers2