2

I'm being given an DNA sequence, for example:

ATTAGGGCCCATTACGCTGACGAGCACTTG

I need to write a function, given two inputs (the DNA sequence, and A, C, G or T) determine the length of the longest possible part of the sequence containing only that specific letter.

dna = 'ATTAGGGCCCATTACGCTGACGAGCACTTG';
giveLength(dna, 'A') 
ans = 
       1
giveLength(dna, 'C') 
ans = 
       3    

I had started out like this:

function length = giveLength(sequentce, amino)
[begin, end] = regexp(sequentie, amino , 'start', 'end')
pos = 1;                            
if isempty(begin)
    error('Doesn't exist!')
else
for i = 1:length(begin)
    if begin(i) ~= end(i)          
        if (end(i) - begin(i)) > (end(i) - begin(i)) || (end(i) - begin(i)) > 1
            pos = end(i) - begin(i);
        end
    end
end
length = pos;
end

Obviously this doesn't work, because for each letter the begin and starting position are the same, and I can't write amino+ either to let it select the parts that correspond.

Help would be highly appreciated!

Nicolas
  • 89
  • 1
  • 8

2 Answers2

1

You can just subtract end from start vectors to get the length of the matching string. Finding the max gives maximum length. This one is also a bit more general, in that you can pass it a sequence such as 'AG' and it will return the number of repetitions of that pattern...

function len = giveLength(sequence, amino)
[begin_i, end_i] = regexp(sequence, sprintf('(%s)+', amino) , 'start', 'end');
if isempty(begin_i)
    error('Doesn''t exist!')
else
len = (max(end_i - begin_i) + 1) / numel(amino);

BTW, try to avoid using variable names such as length, end etc which are either internal functions or keywords.

zeeMonkeez
  • 5,057
  • 3
  • 33
  • 56
  • Thanks, this works! Yeah, I didn't use those variables in my real script though. I only translated them to english and then forgot to name them otherwise. Thanks for the tip though! – Nicolas Feb 09 '16 at 16:56
0

I would use this answer and adapt it to your needs.

J=find(diff([dna(1)-1, dna]));
repetition=diff([J, numel(numCode)+1]);
symbol=dna(J)

Now having this little preprocessing done, you can query the length for a certain symbol

max(repetition(symbol=='C'))
Community
  • 1
  • 1
Daniel
  • 36,610
  • 3
  • 36
  • 69