1

I am trying to do some text file manipulation in Linux.

I have a file, called names.txt that looks like this:

A1
X12
B4
Y5
C10
Z23
B8
C3
Z6

And I need it to look like this:

A01
B04
B08
C03
C10
X12
Y05
Z06
Z23

GOAL: I need to zero-pad the single digits and then alphabetize the results, and save to file sorted_names.txt

I'm thinking I need to count the number of characters per line first, and if the number of characters is less than 2, then add a zero. Lastly I would need to sort alphabetically.

For starters, I think I do this to count the number of characters per line:

cat names.txt | while read line
do

  count=$(echo $line | wc -c)
  echo $line $count

done

Then my thought was to loop through count:

for COUNT in $count
if [( $COUNT = "3" )];
then
    echo doZeroPadHere
fi
Sheila
  • 2,438
  • 7
  • 28
  • 37

4 Answers4

2

Is it important to you to do it using only built-in Bash features? Because it seems easier to use sed and sort:

<names.txt sed 's/^\([A-Z]\)\([0-9]\)$/\10\2/' | sort >sorted_names.txt
ruakh
  • 175,680
  • 26
  • 273
  • 307
1

Here's a solution using only Bash and sort:

while read line
do 
    printf "%s%02d\n" ${line:0:1} ${line:1}
done <names.txt | sort >sorted.txt

This reads lines from names.txt, and splits each one up into its first character (${line:0:1}) and the rest of the line after the first character (${line:1}). It uses printf (more details) to print the first character verbatim, and the rest of the line as a 0-padded number. It redirects its input from names.txt (avoiding a useless use of cat), pipes the output to sort, and redirects that into sorted.txt.

Brian Campbell
  • 322,767
  • 57
  • 360
  • 340
  • Hi @Brian. When I try this, I get an error ":invalid number4" but the file sorted.txt is still generated correctly. Do you know what the error is and what it means? – Sheila Nov 23 '12 at 09:34
  • @Sheila It happens because one of the lines contains something other than a number after the first character. `${line:1}` means "all of the characters on the line after the first". The `%02d` part of the `printf` format string says "print out as a decimal integer 2 characters wide padded with 0". If the rest of the line isn't a number (for instance, your line says `A0Z` or `D20x` or something), you will get this error. If you want to find out which lines are the problem, add before the `printf`: `[[ ${line:1} =~ [^0-9] ]] && echo $line >&2`; this will print out the lines that cause problems. – Brian Campbell Nov 23 '12 at 17:49
  • Thanks for your comments. The line with the problem was "P24". This doesn't seem to have any different characteristics than the others but it is not included in the sorted.txt file. Any thoughts? – Sheila Nov 24 '12 at 03:14
  • @Sheila I'm not sure; could be a stray invisible character? Maybe a space after the end of the line? Can you post the exact file somewhere? – Brian Campbell Nov 24 '12 at 03:20
0

Here's one way using awk and sort:

awk '{ printf "%s%02d\n", substr($0,0,1), substr($0,2) | "sort" }' file

Results:

A01
B04
B08
C03
C10
X12
Y05
Z06
Z23
Steve
  • 51,466
  • 13
  • 89
  • 103
0

Here's a Perl way to do it:

perl -lpe's/^([A-Z])(\d)$/${1}0$2/' names.txt

For each line, if it matches exactly one letter and one digit, change it to the letter, a zero, and the digit. Then print.

Andy Lester
  • 91,102
  • 13
  • 100
  • 152