2

Assume we have n three letter substrings. It is possible to make a string of length n+2 out of these N substrings by concatenating them (Where overlapping letters are written only once) . Whereby this string must have the form a1,a2,a3,a4...

So it is only allowed to link two substrings if they overlap at two adjacent places: 'yxz' + 'xzw' = 'yxzw' , but 'yxz' + 'aby' is for example not allowed.

Example 1: The n = 3 three letter substrings are 'abc','cde','bcd' Output: YES . Because 'abc' + 'bcd'+ 'cde' = 'abcde' is a valid String with n+2 = 5 letters.

Example 2: The n = 3 three letter substrings are 'abc','bca','bcd' Output: NO. Because its not possible to concatenating them all.

How can i finde an efficient algorithm for this problem? Trying all possible combinations takes far too long with O(n!)

Diefapa
  • 21
  • 2
  • I think you can use 2 maps. How large can be `n` though? – nice_dev Mar 08 '19 at 13:25
  • @vivek_23 the maximum value of n is approximately in the order of 10^4 – Diefapa Mar 08 '19 at 13:29
  • Looks like you will need to use backtracking for this. Because in case of a clash, we can't guarantee which one leads us to a solution consuming all substrings. Also, can a single substring repeat? – nice_dev Mar 08 '19 at 13:50
  • Why does `Example 2` have no solution? I see `bca` + `abc` => `bcabc`, 5 characters from a set of 3 strings. Do we have to use all strings in the set? Must they be used in order? – Prune Mar 08 '19 at 19:19
  • @Prune I was under the impression they must overlap exactly on two characters (end + beginning). Hence the OP's example, 'yxz' + 'xzw' = 'yxzw' – גלעד ברקן Mar 08 '19 at 19:37
  • Yes, that's my impression as well, but the wording isn't crisp enough for me to be certain. I asked to make sure that the answers already posted will, indeed, solve OP's problem. – Prune Mar 08 '19 at 19:41

1 Answers1

3

One of the popular approaches to solving this kind of problems is to build the overlap graph of the input sequences, whose vertices are your triplets and where an arc a_i -> a_j between two triplets means that the last two letters of a_i are the first two letters of a_j; and then to find a Hamiltonian path in the resulting graph.

A naïve search would of course not outperform the exhaustive search you mention, but the linked Wikipedia article gives some leads on how to do this more efficiently.

Anthony Labarre
  • 2,745
  • 1
  • 28
  • 39