0

I'm trying to find the length of the longest possible string of consecutive digits that contains no repeated 3-mers.

This is a bioinformatics question, and I'm sorting this for protein sequence.

basically, something like 0102340109 does not work because 010repeats.

But something like 0002223589765 works because you cannot find any repeated 3 digits.

I need to find the longest sequence and I'm kinda stuck and clueless.

JY078
  • 393
  • 9
  • 21
  • 1
    Do you really have nine distinct digits in your actual application, or is the real number there considerably different? Would `701080109` contain a repeated `010` even though the string in between is not a multiple of three digits in length? How about overlapping repeats? Would `01010` be illegal because there are two occurrences of `010` even though they overlap? – MvG Nov 02 '16 at 08:02
  • 2
    I think this is the reference you need https://en.wikipedia.org/wiki/De_Bruijn_sequence – Bill Nov 02 '16 at 08:51
  • Thank you so much! I never thought about its actually de bruijin sequence – JY078 Nov 02 '16 at 17:49

1 Answers1

0

The following codes are written in ES6. You can make a sliding procedure which takes a string input a returns an Iterable of substring "windows"

Array.from(sliding (3,1) ('012345'))
// [ '012', '123', '234', '345' ]

Array.from(sliding (2,2) ('012345'))
// [ '01', '23', '45' ]

Array.from(sliding (4,2) ('012345'))
// [ '0123', '1234', '2345' ]

Then, using this, you can define a seqIsRepeated procedure which iterates thru the sliding windows. Instead of pre-computing the entire list of windows, we will look at them 1-by-1, adding each result to a Set. If the window already exists in the Set, true will be returned immediately and iteration is stopped. If the procedure makes it thru all windows without finding a duplicate, false will be returned.

const sliding = (m,n) => function* (xs) {
  for (let i = 0; i + m <= xs.length; i += n)
    yield xs.substr(i, m);
};

const seqIsRepeated = n => xs => {
  let set = new Set();
  for (let seq of sliding (n,1) (xs))
    if (set.has(seq))
      return true;
    else
      set.add(seq);
  return false;
};

console.log (seqIsRepeated (3) ('0102340109'));    // true
console.log (seqIsRepeated (3) ('0002223589765')); // false

This doesn't find you the longest sequence, but hopefully it does give you a start. From here, you'd be looking at substrings of your input sequence and using seqIsRepeated(3) to eliminate substrings as possibilities

Mulan
  • 129,518
  • 31
  • 228
  • 259