I'm doing some coding with DNA sequences and I'm interested in a function to find sequential repeats (which could represent where primers could 'slip' AKA do bad stuff).
An example of what I'm interested in would be as follows:
longest_repeat('ATTTTCCATGATGATG')
which would output the repeat length and coordinates, in this case 9 long and 7:15. The function should have picked up the ATGATGATG at the end and since it is longer than the TTTT repeat and the TGATGA repeat, it would only report the ATGATGATG. In the case of ties, I'd like if it could report all the ties, or at least one of them.
It would also be nice to set a threshold to only report these sequential repeats if they're over a specific length.
I have some experience in python, but this specific question has me stumped, since if I code it inefficiently and put in a 50 character long string it could take forever. I appreciate all the help!