I have a fasta file like this:
myfasta.fasta
>1_CDS
AAAAATTTCTGGGCCCCGGGGG
AAATTATTA
>2_CDS
TTAAAAATTTCTGGGCCCCGGGAAAAAA
>3_CDS
TTTGGGAATTAAACCCT
>4_CDS
TTTGGGAATTAAACCCT
>5_rRNA
TTAAAAATTTCTGGGCCCCGGGAAAAAA
>6_tRNA
TTAAAAATTTCTGGGCCCCGGGAAAAAA
I have a code that I want to use to separate sequences based on their ids that have matching patterns like 'CDS', 'tRNA' etc. In the code below, I am trying to use startswith and also match pattern in line which doesn't seem to work. Can someone please help me how to look for two conditions in line in python.
code: python mycode.py myfasta.fasta
#!/usr/bin/env python
import sys
import os
myfasta = sys.argv[1]
fasta = open(myfasta)
for line in fasta:
if line.startswith('>') and 'CDS' in line:
print(line)
else:
print(line)
Expected output (if I use CDS
):
>1_CDS
AAAAATTTCTGGGCCCCGGGGG
AAATTATTA
>2_CDS
TTAAAAATTTCTGGGCCCCGGGAAAAAA
>3_CDS
TTTGGGAATTAAACCCT
>4_CDS
TTTGGGAATTAAACCCT