I am trying to create a program that has multiple sequences of tRNA stored as a dictionary. I have set up my code to extract and store the sequences and the specific names associated with the sequences as:
class Unique():
def __init__(self, seq = ''):
for s in range(len(seq)):
for e in range(s + 1, len(seq) + 1):
self.add(seq[s:e])
self.head = head
self.sequence = seq
self.original = {}
def cleaner(self):
for (header, sequence) in myReader.readFasta():
clean = sequence.replace('-','').replace('_','')
self.original[self.head] = clean
return self.original
def sites(self):
Unique.cleaner(self)
I am calling on the sites function (which is why it runs cleaner as the first step), but I am lost on how I can go about writing code to find unique strings in each stored sequence.
As an example if I have 2 sets of Sequences:
UCGUUAGC
AGCGCAUU
The program would be able to tell me that the first sequence's unique string is UCG
and the second's is AGC
, since UCG
is ONLY present in the first sequence and AGC
is only present in the second.
EDIT: What I mean by unique sequence: Any strand of the sequence I can see and automatically know which sequence it came from. So if the strand UCGA
only exists in one sequence, it is counted and saved as a unique strand associated with that sequence.
The sequences extracted look like this:
GAGAGAGACAUAGAGGDUAUGAPGPPGG'UUGAACCAAUAGUAGGGGGUPCG"UUCCUUCCUUUCUUACCA