I'm a beginner with Python and am playing around with various ways to do the simple task of reverse-complementing a DNA or RNA sequence to learn some string functions etc. My latest method nearly works but for a minor irritant that I can't find an answer to, probably because there is something I am using which I don't understand properly. My function is designed to write a blank file (this works!) and then open a file containing the sequence, loop through it one character at a time writing its reverse complement to the new file. Here's the code:
def func_rev_seq(in_path,out_path):
"""
Read file one character at a time and retrun the reverse complement of each nucleotide to a new file
"""
# Write a blank file (out_path)
fb = open(out_path,"w")
fb.write("")
fb.close()
# Dictionary where the key is the nucleotide and the value is its reverse complement
base = {"A":"T", "C":"G", "G":"C", "T":"A", "a":"t", "c":"g", "g":"c", "t":"a", "k":"m", "m":"k", "y":"r", "r":"y", "b":"v", "v":"b", "d":"h", "h":"d", "K":"M", "M":"K", "Y":"R", "R":"Y", "B":"V", "V":"B", "D":"H", "H":"D", "U":"A", "u":"a"}
# Open the source file (in_path) as fi
fi=open(in_path,"r")
i = fi.read(1)
# Loop through the source file one character at a time and write the reverse complement to the output file
while i != "":
i = fi.read(1)
if i in base:
b = base[i]
else:
b = i
with open(out_path, 'r+') as fo:
body = fo.read()
fo.seek(0, 0)
fo.write(b + body)
fi.close()
fo.close()
The problem is that when I run the function, the string in the output file is firstly truncated by a single character and secondly is below a blank line which I don't want. screen shot of input and output file examples As I understand it, the seek function with (0, 0) ought to refer to the start of the file, but I may have misunderstood. Any help greatly appreciated, thanks!