Hello my question is how to count the longest sequence of AGATC
(meaning AGATCAGATC
is sequence of two) in the very long string.
I need to use python. My code for now looks like this:
pattern = re.compile(r'AGATC')
matches = pattern.finditer(text_to_search)
for match in matches:
agatc += 1
print(agatc)
Number 28 will be printed but I know that 22 is the right answer.
text analyzed here:
GCTAAATTTGTTCAGCCAGATGTAGGCTTACAAATCAAGCTGTCCGCTCGGCACGGCCTACACACGTCGTGTAACTACAACAGCTAGTTAATCTGGATATCACCATGACCGAATCATAGATTTCGCCTTAAGGAGCTTTACCATGGCTTGGGATCCAATACTAAGGGCTCGACCTAGGCGAATGAGTTTCAGGTTGGCAATCAGCAACGCTCGCCATCCGGACGACGGCTTACAGTTAGTAGCATAGTACGCGATTTTCGGGAAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCCCGTCAACTCATTCACACCGCATCCTTTCCTGCCACTGTAACTAGTCGACTGGGGAACCTCATCATCCATACTCTCCCACATTATGCCTCCCAACCTTGTTAAGCGTGGCATGCTTGGGATTGCATTGATGCTTCTTGGAGAGGACGCTTTCGTTTTGGAGATTACAGGGATCCAATTTTATCATCGGTTCGACTCCCGTAACGACTTAGCAGTAAGGGTGCTAGTTCCTGGTTAGAATCTTAATAAATCACGTCGCTTGGAGCAAGACAAAGATCGTCGTAATGCCAAGTGCACGACCACCTTCAGACTTGCAGGACCCGTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTCGATAGCTATGCGGTTCAATACAATCTTAACGCAATGCAGCGATGTGGTTTCGTACACTTAGCATAAAACCCCCCACATTAAATCGATGTACCCGCCCTCTTAGACGCCAATTTCAATGCCGAACCTCCGGCGGGTATCTCTGCACTAGGAGAAGTAGCACGTCGCTGTAGCGAACTCCTATCGTGAGATAATTTGTAGAGCTGCTCTTATAATACAATAGCTCAGATGGATTATTCCATGGACATCCCCGTGCGTTGTTTCGAGGATGGTAGGTGGAAATTTTGCCAGACCTCTAGTCTTAAACATGGTTGACGTTATAGGCGCTATCTCTTGCGTCTGGAAGTGTTAATCCGTGAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAACACGCAACTCTGGAGGAGGGCACTGCACTGCAAACTTGCGTAATATCCTTCACCCACACTTGCCTGGCCTCCTTGCTTAAAGCTCTGGCGATGCGATTTTTCGGCCCAGTAGCTGAATAGGTCATGAAATGGGCACCGAACTGGAAAGACCCATATATTCGATACTCACAACTTAATGATAGCGCGATTAAGAGCGACACCAAAAACCAAATTACGTTCACGAACCTTTGAGAGTCAAGGAGACTTAGACCGAATTGAATGATCACTGATGCGCCCGCTGATACTGAGCCTCACCATTAATCGCCGACCAATACGGCGTGTACCGGGCGCGGCCTTGCCGCATAACGATAGATAGATAGATAGATAGATAGATAGATAGATAGATAGATAGATAGATAGATAGATAGATAGATAGATAGATAGATAGATAGATAGATAGATAGATAGATATCTGTCTGTCTGTCTGTCTGTCTGTCTGTCTGTCTGTCTGTCTGTCTGTCTGTCTGTCTGTCTGTCTGTCTGTCTGTCTGTCTGTCTGTCTGTCTGTCTGTCTGTCTGTCTGTCTGTCTGTCTGTCTGTCTGTCTGTCTGTCTGTCTGTCTGTCTGTCTGTCTGTACACAGCCCCGTCCTCATTGCTAAGTGCACTGGCAACTGGACCTAAAGATTTTTCGAGTATGGCCCTCGAATCAAGCGCCCACCCAGAAACCTACGAGCCAGTAACCCCAGTAAACAAGCATTAGTGCTATATGCTTGCTGCCCACTAGGACCCTTATGGTTCATACCAGGGTGACGTGTCTTGCGGGCCAAGGATGAACCAGAAGCAAGATCCTTAGATGGACGACTGTCTCATTGCTTAAACTCCACATACCAAAGGGCGCGGTAAACGATAGTTTTAGGTAATGTTAGTCGGATGGTTGTCTGCAGCTACCAATACAGCCTGGCACCCAGGGTCTGAACAATAACGCGTGAGAGCAGCTCTCCCGCGTGTGGTGGATTTGCCGTCTATGAAATTGAGGCTCTTGCAACTATTCGCACTCGGAATGCCCTCATATCTGGTGCCTAGCGGCCTTTGCCCCGTGCCGGTAGGACTAAACTCTACGGATCGTTGACGGATCTCGATGTGGAAGATGGTTATGAAAGATAACAACGCGTGTGCTAATTGATTTAGACAAGTATTGCGGCAGTAAAAGATAATCGGCTGCAGAGTTACGAAAGACTTCCATGCATGGATTCCATTCCTTCTAGTATAGGACCCACTCTGAATACACGTCTTGCGGGCCGATCATCTCCACCGCTGCGGAAGAAAGCAATTAAGAATCTATGCTCATTAAGAGTGCGACTATAATGCGGATCTTACAGTGCTAATGATCAGGACGTCGTCCAAGCAGGCTGCATGCCGAATTTAGCTTACGTCAGGATCAGGCGTTATAGCCTGGGAATCGGACTATGAGGACGCCACGACCTCTGGGAGAAAGCTATATACATTGAGGATCGCGCCATCTTTATGAGACTCAAATGAATCTAGATAGGTAGCATTGCGGACTTGAGTTAGCACATCGGTATTGGAAGGTGAGGGTCCTGCCGCTCGTTCTATGTTCGGTTTATAGTATACAAATAGGTCATCCCGAACGTTGAAGTTAAACTCATGACACGTTGTCGTAATGAAACGGGCCTGTTATTAGGGATACAGACAAAAGGCACAAGCTGGCTTGCACATTAAGGCGCACTAGAGATCCTCACAACCGTTGCCCGCACGGAGGTCGTGTCTAACAGACAGTGAACCAGCCGTATTGGGGTGGATGACCTGAGCTTCTTGGGGCCTGTTGTACACCGCGTGTGGTTCAACTGGTACACATACTACGAATATTCGAAATCATTGTACTGTGCTCTTCGGTGCTACTGACTGTGAGCGAATGCATCCCAATCCCAAACAATGCTTGTGGTAGGAGAATTGAAACTCTCGAAGCCTGGCCCAATGTCATCTACTTTTAACATGTCGGGCCAGGAGTTACGGGCATTGCTTACTTACTTTGCCCCCTTACACCACAGCAGCGCGATTCTTGTTGTAGTAGATTTTATACGACTCGCGAATTAAATGGAACTTGTCTGTCCCATATCGATCGTGTCCATCGTAAGATGAGATTGTAGGAGCATTCGGAAGTCTATGCGGCCCAGGGACTACTACGTTAAATCTGGTCAGACGTGGTTTACAAGGCGTCCCGATCTTCTCAGAACATATGGGAAAGCACTACCGTTCCTTCACGCATACAGTTGTTCGTGCCGAACGAGTAAGCTTGCGACCAGCCCACCCGCTAGGGCTATGCAGCGGGTCATGGCTGGCGCCATACTGTGCGGACAACCCACGCTCTGGCAGAAAGCGTCTTGTGTTTTGTAGTAGCTCCAACGGTTAGACCTTCGATATCTATTCAGAGCGCGAGCGACCACTATTAGACGGCATGTAAACAATGTGTATTTGTTCGGCCCAACCGGTATATGGGTAAGACCGCGAAGGGCCTGCGCGAATACCAGCGTCCAAAAATTCCTCACCCGAGATATGCGGTTAGTACCCCTTGGGTAACGGTCCGCTACGGGTAGCGACGCGAGCCGGCCGCATCGGTTGGAGCCGAGTTGTCGGGCAGGCGAGTAACGTGTGCAATTTGATGGGCCCAAGCCTCCGGCACTATCCACCTCATACATCGACAAAAGCACCAAATATGGGGAAAAGCTGAGCGTCGATATGTACATCTACCCAGGAACCGGCCCGAACATTAGGCGGACGTGAATTTCCGACCTAGGTTCGGCTACATTTCTACGATCCAAGCACACGTGAAGGAGGAGGGGTGTTCCGACCGTAAATGAACGAGGTGCGCAGTGACCCGATGGCGTTTAGCGGATAGCCTTCCTATGCCGGCCTATGCTGTATGGTAGTTGGTTGGTGCCTCCAGAGCCACTGCACCCAATCATAGGGTCTACAGCAGCGTACTTATAAAATTGTACGGGTGACCCATATCCATTACGGGTTGCGACCAGTATAGGAGAGTATAACTGCGTGAACTAATGCGTTATGACGCTTCAGAGTTTGCTCGGGCCCGAGTTCTAGGGCTATAATGTGTTAGGGCGCAAGTATGCCAAGCTAAGATGTGGCGTGCACACTAGGAGTTGTGTTCCTCTGCAAGCAGACACGAGCACTCTGGCAGTAGTTTGACCACACCCGGGTATCACTGCTACTCCATTTCGAACAAGCTATTGGAGCGGACAAAATATGCTACTCAAGAGCATTAGTTATAGGTCTACGAGACAGAAGCAGTTACTGAGTCTGAATATTCGATATAAGTAGGCATGGAGGCGGAGCAAAACAACGTCTGCGATCAATCGTGTTGATGACGTATGGCGACTGGAAGGTAAGGACTATGGCCGGACGGAATGATTCATGTTCTGTTCAAAGCTATATTTCGAAGGGGTATATTAGCGGTCCTACACTTGGTTAGCACCCTCCCCCCTCTGGATCCTGCACTAATTCGAGCTGGCCTCCATCGGTATCAGTCCGGAAGCTCCACTCTCTATCGTAGTCCTAATCAACAGGGTGCCAGTTTGCTCACGTGGAAGTTTGAGGCCCTTTGTGCTCCATAGCCAATCACTAACCATGCACGCGCGACCCACTCTACGTCCAGATCGGCTATAATAGTTGCGCCCGGGACTGGCAGAGTAGACATGTAAGCTAGATAGAGCCCCGACATCGGCCAAGAGATCCTACGCTGCTTCCAGATAATGAGAGACATTCTAGCATTAGACATGCAAGTCGGCAGGGACTCCCCTTATCTAGTAATTTCGATGAATTGGTTTTTCGGCTAGCATCTAGTCTAGTCTAGTCTAGTCTAGTCTAGTCTAGTCTAGTCTAGTCTAGTCTAGTCTAGACCATGCCGACCTCATCATAGAAGGAATGCTCTAAACTTAGAGTGCTACTAGGAAAACTATTAATCAATGATCGTCCTGCTTACATAGCTGGACGGCGAAAGTTCTTATACTGCGGAGGTTGCTGACGTAGAGTGCGCTGGGTACAGCGGATAAGTTGATCAGGGTGGGGATAGGGTGGCTCACCGTTTATACTCATATAGATTCCTGGCGTCGACGCTGTGACAGGGTCGAGATCGAGGGGGAGATCAGATCAGATCAGATCAGATCAGATCAGATCAGATCAGATCAGATCAGATCAGATCAGATCAGATCAGATCAGATCAGATCAGATCAGATCAGATCAGATCAGATCAGCGGAGCGGAGGGAAAATTATCACCAGAGGGTAGGGGCTCGCGACATTCTATTCAATGCATTTCAAGCTACTTACGTATTTCGGCACAGTGACTACTGCCTGCGCGGCAGCCGTAAGGTTTCCCGTCAATAGGTGGCACGTATCATTGATGAAAGTGTCAGCTAATCATTCAGGCCTTA