3

I have a text file that can have X number of fields, each separated by a comma. In my script I reading line by line, checking how many fields have been populated on that line and determining how many commas i need to append to the end of that line to represent all the fields. For instance a file looks like this:

Address,nbItems,item1,item2,item3,item4,item5,item6,item7    
2325988023,7,1,2,3,4,5,6,7
2327036284,5,1,2,3,4,5
2326168436,4,1,2,3,4

Should become this:

Address,nbItems,item1,item2,item3,item4,item5,item6,item7
2325988023,7,1,2,3,4,5,6,7
2327036284,5,1,2,3,4,5,,
2326168436,4,1,2,3,4,,,

My script below works, but it seems terribly inefficient. Is it the reading line by line that has a hard time on large files? Is it the sed that causes the slowdown? Better way to do this?

#!/bin/bash

lineNum=0
numFields=`head -1 File.txt | egrep -o "," | wc -l`

cat File.txt | while read LINE
do
        lineNum=`expr 1 + $lineNum`
        num=`echo $LINE | egrep -o "," | wc -l`
        needed=$(( numFields - num ))
        for (( i=0 ; i < $needed ; i++ ))
        do
                sed -i "${lineNum}s/$/,/" File.txt
        done
done
Chris Seymour
  • 83,387
  • 30
  • 160
  • 202
ssbsts
  • 844
  • 1
  • 8
  • 13

2 Answers2

11

This type of thing is usually best done with a language like awk, for example:

awk 'NR==1{n=NF}{$n=$n}1' FS=, OFS=, file
Scrutinizer
  • 9,608
  • 1
  • 21
  • 22
0

Here's a full bash solution.

(
    IFS=","
    read hdrLine
    echo "$hdrLine"
    read -a header <<< "$hdrLine"
    numFields="${#header[@]}"

    while read -a line; do
        pad=${#line[@]}
        while (( pad < numFields )); do
            line[pad++]=
        done
        echo "${line[*]}"
    done
) < File.txt > newFile.txt
mv newFile.txt File.txt

The awk solution is far better; this is best viewed as a bash demo.

chepner
  • 497,756
  • 71
  • 530
  • 681
  • thanks for you input, however it doesn't actually achieve my goal. From what i can tell it only appends a single comma to every line, even when not necessary, i.e. all fields are already accounted for. – ssbsts Mar 01 '13 at 23:44
  • That's what I get for not testing first. I couldn't have sworn I read recently that the array would be filled with the intermediary slots if you assigned to a larger index. I wonder what I'm thinking of, because it sure does not appear to be `bash`! I'll leave this answer for a bit to see if I can salvage it; otherwise I'll delete. – chepner Mar 01 '13 at 23:52