-5

Sample Input:

ACGTTGCATGTCGCATGATGCATGAGAGCT # this is the sequence in which we have to search

 4 # this is the k-mer(integer value)

Sample Output:

CATG GCAT

I do not understand how to do this. Please help me. Thanks in advance.

Domecraft
  • 1,645
  • 15
  • 26
  • This questions is underspecified -- "a k-mer of length 4" tells us nothing about the content of the sequence you're looking for. Please provide more information. – THK Nov 13 '13 at 04:33
  • You want to find the substrings of the given length that occur at least twice? – beroe Nov 13 '13 at 05:41
  • 1
    Looking for solutions for Bioinformatics Algorithms at Coursera? – chupvl Nov 13 '13 at 21:43

1 Answers1

1

If I understand your question correctly, here is one way to work through the list:

s="ACGTTGCATGTCGCATGATGCATGAGAGCT"
n=4
k=len(s)-2*n
klist = []
for i in range(k):
    kmer=s[i:i+n]
    if not(kmer in klist) and (kmer in s[i+n:]):
        klist.append(kmer)
print klist

It looks like your example had a few more kmers that expected, unless I am misunderstanding:

['TGCA', 'GCAT', 'CATG', 'ATGA']

For n = 5:

['TGCAT', 'GCATG', 'CATGA']

And even for n = 6:

['TGCATG', 'GCATGA']
beroe
  • 11,784
  • 5
  • 34
  • 79
  • why we are subtracting "2*n" instead of just "n"? @beroe – jdyg Jan 15 '14 at 20:56
  • @user2094920 - I didn't do a lot of (any) real testing, but since that is the *starting* point of the kmer, I think I was trying to trim off the last 2*n positions, so you would start at `Y`s, but not `X`s for n = 4 with `AAAAAYYYYYXXXX` Probably would be better to use `-(n+1)`? – beroe Jan 15 '14 at 21:24
  • On the other hand, `-2*n` finds two kmers with `s='GATGXXXXGATG'` – beroe Jan 15 '14 at 21:32