Input file:
>AMSF107-09|Perciformes|COI-5P|GU661092
TAGTA-
>AMSF114-09|Perciformes|COI-5P|GU661101
C-ACGC
>ANGBF3683-12|Haemulon_sp._B_JJT-2012|COI-5P|JQ741244
-GCAGTT-CA-
I want to replace the hyphens in TAGTA-
, C-ACGC
, and -GCAGTT-CA-
with N
's but leave the headers (the lines that start with >
) intact. I'm looking for a regex that will match a hyphen next to an A
,C
,G
, or T
but exclude matches that begin with the >
character.
Desired output
>AMSF107-09|Perciformes|COI-5P|GU661092
TAGTAN
>AMSF114-09|Perciformes|COI-5P|GU661101
CNACGC
>ANGBF3683-12|Haemulon_sp._B_JJT-2012|COI-5P|JQ741244
NGCAGTTNCAN
EDIT:
I know the very basics in regex. So far I've tried (ACGT)?\-(ACGT)?
but that matches every hyphen.