4

I have a fasta file like the test one here:

>HWI-D00196:168:C66U5ANXX:3:1106:16404:19663 1:N:0:GCCAAT
CCTAGCACCATGATTTAATGTTTCTTTTGTACGTTCTTTCTTTGGAAACTGCACTTGTTGCAACCTTGCAAGCCATATAAACACATTTCAGATATAAGGCT
>HWI-D00196:168:C66U5ANXX:3:1106:16404:19663 2:N:0:GCCAAT
AAAACATAAATTTGAGCTTGACAAAAATTAAAAATGAGCCCAGCCTTATATCTGAAATGTGTTTATATGGCTTGCAAGGTTGCAACAAGTGCAGTTTCCAA
>HWI-D00196:168:C66U5ANXX:4:1304:10466:100132 1:N:0:GCCAAT
ATATTTGAATTATCAGAAATAAACACAAAGAAAACCTAGAACAGATAATTTCTTCCACATTATTGATCAGATACAGATTTCAAGGGTACCGTTGTGAATTG
>HWI-D00196:168:C66U5ANXX:4:1304:10466:100132 2:N:0:GCCAAT
AAACGATTGATAGATCTATTTGCATTATAAAAACATTAAAAAAACAAAATACTGATTAAATGTCGTCTTTCTATTCCACAATTTTATAGATCTCACTGTAT
>HWI-D00196:168:C66U5ANXX:4:1307:12056:64030 1:N:0:GCCAAT
CTTACTTTGCCTCTCTCAGCCAATGTCTCCTGAGTCTAATTTTTTGGAGGCTAAGCTATGAGCTAATGATGGGTTCCATTTGGGGCCAATGCTTCAGCCTG
>HWI-D00196:168:C66U5ANXX:4:1307:12056:64030 2:N:0:GCCAAT
CTATTAGTTCTTATCTTTGCCTGCAAATATAAGACTAGCGCTTGAGTAGCTGACAGAGACAAAGTAAGCTGGAGTGTTTATCACCTGGTCACTCCAATTGT

When i type in a simple grep command like:

grep -B1 "CTT" test.fasta

I get a really strange output in which "--" is sometimes placed on a newline above the grep hit like so:

>HWI-D00196:168:C66U5ANXX:4:1304:10466:100132 2:N:0:GCCAAT
AAACGATTGATAGATCTATTTGCATTATAAAAACATTAAAAAAACAAAATACTGATTAAATGTCGTCTTTCTATTCCACAATTTTATAGATCTCACTGTAT
--
>HWI-D00196:168:C66U5ANXX:4:1307:12056:64030 2:N:0:GCCAAT
CTATTAGTTCTTATCTTTGCCTGCAAATATAAGACTAGCGCTTGAGTAGCTGACAGAGACAAAGTAAGCTGGAGTGTTTATCACCTGGTCACTCCAATTGT

I can't figure out why some fasta entries have this and others don't. I don't get this problem when i remove the -B1. I can remove those lines from my file with a grep -v "--" statement, but I'd really like to understand what's going on here.

michberr
  • 303
  • 1
  • 10

2 Answers2

4

You are asking for one line of leading context by using the -B1 option. This means grep will display both the line which matched and the line directly before it. Each match will be separated by -- on a line by itself as shown below:

$ man grep | grep -B1 context
     -A num, --after-context=num
             Print num lines of trailing context after each match.  See also
--
     -B num, --before-context=num
             Print num lines of leading context before each match.  See also
--
     -C[num, --context=num]
             Print num lines of leading and trailing context surrounding each
--
     --context[=num]
             Print num lines of leading and trailing context.  The default is

The reason you aren't seeing -- between every match is that the context is only displayed above a sequence of consecutive matches. So see the following example:

seq 13 | grep -B1 1
1
--
9
10
11
12
13

The seq command produces all the numbers between 1 and 13. Only the first line and the lines from 10 on contain a 1, so you see the 1 in its own group, then --, then the one line context, then the group of consecutive matching lines.

chthonicdaemon
  • 19,180
  • 2
  • 52
  • 66
  • 1
    Thanks @chthonicdaemon, my confusion is to why only **some** matches are separated by `--` and others are not – michberr Jun 22 '16 at 04:34
  • 1
    The `--` will only be put in if there are non-consecutive matches. I'll update my answer. – chthonicdaemon Jun 22 '16 at 04:35
  • At the risk of stating the obvious, you can remove the inter-match separators using eg `grep -B1 "CTT" test.fasta | grep -v ^--`. – clstrfsck Jun 22 '16 at 05:21
0

GREP_COLORS section of the grep manpage says :

Specifies the colors and other attributes used to highlight various > parts of the output. Its value is a colon-separated list of capabilities that defaults to ms=01;31:mc=01;31:sl=:cx=:fn=35:ln=32:bn=32:se=36 with the rv and ne boolean capabilities omitted (i.e., false).

and

se=36
SGR substring for separators that are inserted between selected line fields (:), between context line fields, (-), and between groups of adjacent lines when nonzero context is specified (--). The default is a cyan text foreground over the terminal's default background.

Consider file sample.txt :

$cat sample.txt
ABBB
AAB
AAB
S
S
S
AABB
ABAA
BAA
CCC
$grep -B2 'AAB' sample.txt
ABBB
AAB
AAB
--
S
S
AABB

Here -- is the way of grep to tell you that AAB before -- and S after -- are not adjacent lines in the actual file.

Мона_Сах
  • 322
  • 3
  • 12