I was wondering if there is an implemented way to get all the occurrences of a given substring and the suffix array. I was testing a function which I found in here: https://hg.python.org/cpython/file/2.7/Lib/bisect.py which some modifications. What I get is only one of the position of the occurences when there are other ones. For instance,
seq = "ATGTGCAAGAATGAGGCAAG$" #original string
array = [20, 17, 6, 9, 18, 7, 13, 10, 0, 16, 5, 19, 8, 12, 15, 4, 14, 2, 11, 3, 1] #suffix array
def bisect_left(array, query, seq, lo=0, hi=None):
if lo < 0:
raise ValueError('must be non-negative')
if hi is None: #by default len(array)
hi = len(array)
while lo < hi:
mid = (lo+hi)//2 #set the middle to binary search
if seq[array[mid]:] < query:
lo = mid+1
else:
hi = mid
if not seq[array[lo]:array[lo]+len(query)] == query:
raise IndexError('there is not any index for the query')
return array[lo]
print(bisect_left(array, 'ATG', seq))
By executing this the output is 10
when should be 0,10
What could be wrong?