I have a text file which contains 2 lines of a sample DNA sequence, usingpcregrep
, I want to find patterns matching "CCC" especially the patterns that span through multiple lines (see end of line 1 to the beginning of line 2 in test.txt below) .
test.txt:
AGAGUGGCAAUAUGCGUAUAACGAUUAUUCUGGUCGCACCCGCCAGAGCAGAAAAUAUUGGGGCAGCGCC
CAUGCUGGGUCGCACAUGGAUCUGGUGAUAUUAUUGAUAAUAUUAAAGUUUUCCCGACAUUGGCUGAAUA
Using Command:
pcregrep -M --color "C[\n]?C[\n]?C" test.txt
Returns:
AGAGUGGCAAUAUGCGUAUAACGAUUAUUCUGGUCGCA**CCC**GCCAGAGCAGAAAAUAUUGGGGCAGCG**CC**
**C**CAUGCUGGGUCGCACAUGGAUCUGGUGAUAUUAUUGAUAAUAUUAAAGUUUU**CCC**GACAUUGGCUGAAUA
It seems to correctly highlight the 2 C's in line 1, however, it highlights the first C in line 2 and then proceeds to print out the second line entirely; giving me a duplication of C.
What am I doing wrong here and how can I avoid the duplication of 'C' in line 2?