5

I need to find a faster way to number lines in a file in a specific way using tools like awk and sed. I need the first character on each line to be numbered in this fashion: 1,2,3,1,2,3,1,2,3 etc.

For example, if the input was this:

line 1
line 2
line 3
line 4
line 5
line 6
line 7

The output needs to look like this:

1line 1
2line 2
3line 3
1line 4
2line 5
3line 6
1line 7

Here is a chunk of what I have. $lines is the number of lines in the data file divided by 3. So for a file of 21000 lines I process this loop 7000 times.

export i=0
while [ $i -le $lines ]
do
    export start=`expr $i \* 3 + 1`
    export end=`expr $start + 2`
    awk NR==$start,NR==$end $1 | awk '{printf("%d%s\n", NR,$0)}' >> data.out
    export i=`expr $i + 1`
done

Basically this grabs 3 lines at a time, numbers them, and adds to an output file. It's slow...and then some! I don't know of another, faster, way to do this...any thoughts?

Jens
  • 69,818
  • 15
  • 125
  • 179
Douglas Anderson
  • 4,652
  • 10
  • 40
  • 49

9 Answers9

16

Try the nl command.

See https://linux.die.net/man/1/nl (or another link to the documentation that comes up when you Google for "man nl" or the text version that comes up when you run man nl at a shell prompt).

The nl utility reads lines from the named file or the standard input if the file argument is ommitted, applies a configurable line numbering filter operation and writes the result to the standard output.

edit: No, that's wrong, my apologies. The nl command doesn't have an option for restarting the numbering every n lines, it only has an option for restarting the numbering after it finds a pattern. I'll make this answer a community wiki answer because it might help someone to know about nl.

Bill Karwin
  • 538,548
  • 86
  • 673
  • 828
  • 1
    Love the Unix tools attempt at an answer in a scripting question. There is also "cat -n" as a less polished nl. And for the reflective student of sed, the following can be modified to get the exact answer desired: http://www.gnu.org/software/sed/manual/sed.html#cat-_002dn – jaredor Dec 12 '08 at 16:28
  • @jaredor you should add that as an answer! – Pithikos Jun 02 '16 at 11:01
  • The rt.com link has become stale. – crw Jan 10 '17 at 13:48
10

It's slow because you are reading the same lines over and over. Also, you are starting up an awk process only to shut it down and start another one. Better to do the whole thing in one shot:

awk '{print ((NR-1)%3)+1 $0}' $1 > data.out

If you prefer to have a space after the number:

awk '{print ((NR-1)%3)+1, $0}' $1 > data.out
Jon 'links in bio' Ericson
  • 20,880
  • 12
  • 98
  • 148
2

Perl comes to mind:

perl -pe '$_ = (($.-1)%3)+1 . $_'

should work. No doubt there is an awk equivalent. Basically, ((line# - 1) MOD 3) + 1.

derobert
  • 49,731
  • 15
  • 94
  • 124
2

Another way is just to use grep and match everything. For example this will enumerate files:

grep -n '.*' <<< `ls -1`

Output will be:

1:file.a
2:file.b
3:file.c
Dmitry
  • 536
  • 6
  • 10
2

This might work for you:

 sed 's/^/1/;n;s/^/2/;n;s/^/3/' input
potong
  • 55,640
  • 6
  • 51
  • 83
1
awk '{printf "%d%s\n", ((NR-1) % 3) + 1, $0;}' "$@"
Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
1

Python

import sys
for count, line in enumerate(sys.stdin):
    stdout.write( "%d%s" % ( 1+(count % 3), line )
S.Lott
  • 384,516
  • 81
  • 508
  • 779
1

You don't need to leave bash for this:

i=0; while read; do echo "$((i++ % 3 + 1)) $REPLY"; done < input
PEZ
  • 16,821
  • 7
  • 45
  • 66
0

This should solve the problem. $_ will print the whole line.

awk '{print ((NR-1)%3+1) $_}' < input
1line 1
2line 2
3line 3
1line 4
2line 5
3line 6
1line 7

# cat input 
  line 1
  line 2
  line 3
  line 4
  line 5
  line 6
  line 7
Ganesh M
  • 3,666
  • 8
  • 27
  • 25