0

I have a large codon alignment that has a variety of gene names in the headers. The headers are in the following format:

>ENST00000357033.DMD.-1 | CODON | REFERENC

I want to modify all of the headers in the fasta to exclude all characters after the first "." and before the first "|". Desired outcome:

>ENST00000357033 | CODON | REFERENC

I've tried a few sed commands, no dice. Any advice? I'm averse to using awk, since I'd like to keep the formatting of the alignment and awk scares me.

Thank you!

2 Answers2

2
sed '/^>/s/\.[^ ]* / /'

for each line starting with a '>' replace 'dot' followed by some char different from spaces followed by a space, by a space.

Pierre
  • 34,472
  • 31
  • 113
  • 192
1

no neeed to be scared by awk:

mawk NF=NF FS='[.][^ ]+' OFS=    

>ENST00000357033 | CODON | REFERENC
RARE Kpop Manifesto
  • 2,453
  • 3
  • 11