-1

I have a big data set and would like to check if every third line has the desired number of bases.

example:

line 1
line 2
ATTGAC
line 4
line 5
TTCGGATC
line 7
line 8
GGTCAA

So line 6 contains 8 bases instead of 6. I would like my script to stop if this is the case.

3 Answers3

2

Sounds like a job for awk:

awk 'NR % 3 == 0 && length($0) != 6 { print "line " NR " is the wrong length"; exit }' file

When the record number NR is a multiple of 3 and the length of the line isn't 6, print the message and exit.

Output from your example (assuming that all those blank lines aren't supposed to be there):

$ awk 'NR % 3 == 0 && length($0) != 6 { print "line " NR " is the wrong length"; exit }' file
line 6 is the wrong length
Tom Fenech
  • 72,334
  • 12
  • 107
  • 141
0

If you only want to check if any line is longer than 6 characters you can use wc -L which gives you the maximnum line length. To grab only every third line, sed can be used with n~m (every m'th line starting with the n'th). This one-liner returns the maximum line length of lines 3, 6, 9, ...

sed -n '0~3p' foo | wc -L
Binsch
  • 1
-3

You can determine the number of chars in a Bash variable with ${#VarName}.

  • 2
    Downvote: Even if this was expanded to actually show how this could be done, processing a file line by line in a shell loop should generally be avoided. – tripleee Mar 01 '16 at 16:37