0
l1 = "GATATATGCATATACTT"
l2 = "ATAT"

for i in range(len(l1)):
    if l1[i] == "A" and l1[i+1] == "T" and l1[i+2] == "A" and l1[i+3] == "T":
       print (i+1)

L1 is the main seq L2 is the sub sequence that I am trying to find in L1. The above code does give me the correct output (2,4,10) but is there a better way ? I am new to coding and am thinking if I have a larger sequence, this might not be efficient. Thanks!

  • Maybe looking for Rabin-Karp Algorithm? Or KMP - Knuth-Morris-Pratt, Boyer-Moore - depends on your requirements. – Daniel Hao Jun 09 '21 at 23:23

1 Answers1

0

You can use re module. But first convert the l2 string to regular expression:

A(?=TAT)

Then you can use re.finditer:

import re

l1 = "GATATATGCATATACTT"
l2 = "ATAT"

search_string = "{}(?={})".format(l2[0], l2[1:])

out = [m.start() + 1 for m in re.finditer(search_string, l1)]
print(out)

Prints:

[2, 4, 10]
Andrej Kesely
  • 168,389
  • 15
  • 48
  • 91