-1

I have been given an assignment for a project with no previous programming experience. It asks to create a motif finder using while loops, incrementals and boo's. I believe I am on the right track but very uncertain as I have no programming experience. Can anybody help me find my wrongs and tell me what I need to do to correct them. Again I am a biology guy asked to take this on and

gi|14578797|gb|AF230943.1| Vibrio hollisae strain ATCC33564 Hsp60 (hsp60) gene, partial cds
CGCAACTGTACTGGCACAGGCTATCGTAAGCGAAGGTCTGAAAGCCGTTGCTGCAGGCATGAACCCAATG
GACCTGAAGCGTGGTATTGACAAAGCGGTTGCTGCGGCAGTTGAGCAACTGAAAGCGTTGTCTGTTGAGT
GTAATGACACCAAGGCTATTGCACAGGTAGGTACCATTTCTGCTAACTCTGATGAAACTGTAGGTAACAT
CATTGCAGAAGCGATGGAAAAAGTAGGCCGCGACGGTGTTATCACTGTTGAAGAAGGTCAGTCTCTGCAA
GACGAGCTGGATGTGGTTGAAGGTATGCAGTTTGACCGCGGCTACCTGTCTCCATACTTCATCAACAACC
AAGAGTCTGGTTCTGTTGATCTGGAAAACCCATTCATCCTGCTGGTTGACAAAAAAGTATCAAACATCCG
CGAACTGCTGCCTACTCTGGAAGCCGTCGCGAAATCTTCACGTCCACTGCTGATCATCGCTGAAGACGTA
GAAGGTGAAGCACTGGCAACACTGGTTGTAAACAACATGCGTGGCATCGTAAAAGGGCAGCAGTT

gi|14578795|gb|AF230942.1| Photobacterium damselae strain ATCC33539 Hsp60 (hsp60) gene, partial cds
GGCTACAGTACTGGCTCAAGCAATTATCACTGAAGGTCTAAAAGCGGTTGCTGCGGGTATGAACCCAATG
GATCTTAAGCGTGGTATCGACAAAGCAGTAGTTGCTGCTGTTGAAGAGCTAAAAGCACTATCTGTTCCTT
GTGCTGACACTAAAGCGATTGCTCAGGTAGGTACTATCTCTGCAAACTCTGATGCAACTGTGGGTAACCT
AATTGCAAAAGCTATGGATAAAGTTGGTCGTGATGGTGTTATCACGGTTGAAGAAGGCCAAGCGCTACAA
GATGAGTTAGATGTAGTTGAAGGTATGCAGTTCGATCGCGGTTACCTATCTCCATACTTCATCAACAACC
AACAAGCAGGTGCGGTGGAGCTAGAAAGCCCATTTATCCTTCTTGTTGATAAGAAAATCTCTAACATCCG
TGAGCTATTACCAGCACTAGAAGGCGTTGCAAAAGCATCTCGTCCTCTACTGATCATCGCTGAAGATGTT
GAAGGTGAAGCACTAGCAACACTGGTTGTGAACAACATGCGCGGCATTGTTAAAGTTGCTGCTGTT

I am in need of some help.

import re

#function parsing header for sequence
def fasta_splitter(x):
    boo=0
    seq = ""
    i=0
    while i < len(lines)
         if line[0] ==">"and boo ==0
        line[i] = header
        boo = 1
        i=1+i
        elif line [i][0] ==">"
        header=line[0]
        seq=""
        i=i+1
        else
        seq=seq+line[i]
            print ("header" + "seq")

#open file and read file by command line
x=open('C:\\Python27\\fasta.py.txt','r+')
lines = x.readlines()
fasta_splitter(lines)
#split orgnaism details from actual bases
# not sure how to call defined function 

re.search(pattern, string)
# renaming string seq to dna
seq ="x"
m = re.search(r"GG(ATCG)GTTAC",dna)   
print "m"
AChampion
  • 29,683
  • 4
  • 59
  • 75
jad12
  • 1
  • 1

1 Answers1

0

For starters, rеad FASTA with a Bio.SeqIO module, so you don't have to write this fasta_splitter monstrosity. Biopython is generally great. Second, you've messed just about everything up. You call re.search without having defined either a pattern or a string. This will just crash. Then you write

seq="x"
...
print "m"

In both cases you use literally letters "x" or "m", and what you need are variable names. Correct thing will be

seq = x
...
print(m)

And all this is assuming this is a student assignment and not an actual research. In latter case it's generally better to use some modern motif finder tool: those are more sensitive and biologically correct than any bunch of regexes could be.

Synedraacus
  • 975
  • 1
  • 8
  • 21