0

I need to find the minimal number of insertions needed to convert a string into a palindrome. Note: the insertions can happen at any place, at the end, or within. If it was only at the end, we have a question here.

So I found out that this can be done in O(N**2) time by this simple trick:

  1. Let the string be s1. Reverse it. Let it be s2. Say the length is l.
  2. Now find the longest common subsequence of s1 and s2. Let its length be x.
  3. The answer is l-x.

For example, suppose s1 = abcda. Therefore s2 = adcba. Length is 5. Longest common subsequence is aba of length 3. So the minimal number of insertions is 5-3 = 2, which is the actual answer, with the resulting string - adcbcda.

However, I cannot understand the logic behind it. Can anyone explain it to me why it works?

And, is there any O(N) solution possible for this?

Community
  • 1
  • 1
SexyBeast
  • 7,913
  • 28
  • 108
  • 196
  • Have a look at this [link](http://cs.stackexchange.com/questions/52416/proof-for-minimum-number-of-insertions-to-convert-a-string-to-a-palindrome) – imharindersingh Apr 27 '16 at 17:14

1 Answers1

1

I don't know whether there is a O(N) solution but by comparing with the reverse, you find a subsequence which is a palindrome. Then you have l-x letters that are not paired. (You can consider a letter's pair as its reflection if you have a mirror right at the middle of the word. e.g. ab|ba) Later, by insertions you just complete the picture.

Now,firstly, how do we find a (maximum)subsequence that is common to two strings? There is a polynomial algorithm for finding it see it here https://en.wikipedia.org/wiki/Longest_common_subsequence_problem

When we try to find the longest common subsequence(lcs) between s1 and s2(reverse of s1) we actually find lcs between the first half of s1 and first half s2, also second half of s1 and second half of s2. Assume

s1 = abcddzac

so s2 = cazddcba. Here we can see it as comparison of abcd with cazd(first half) plus comparison of dzac with dcba(second half). We can see that both of comparisons are the same except they are reverse of each other so their concatenation has to be palindrome, so lcs of s1 and s2 has to be palindrome.

Once we have the lcs(ad|da) which is of length 4, we have 4 more letters that break the symmetry(b,c,z,c). Then we insert one letter for each of them to make a symmetry, i.e. a palindrome. We set our middle point as the middle point of the lcs and consider that we break s1 into two from that middle point so we have

s1 = a bc d|d z a c and we break it like a stick into two from d|d and we end up with:
dzac
dcba

now we simply fill between the letters of lcs so that they are the same. In our case steps are as follows:

dzac
dcba

dzac
dzcba

dzcac
dzcba

dzcbac
dzcba

dzcbac
dzcbac


Now we unbreak it from the same point and we have
cabczddzcbac which is a palindrome.

Note: cddc is also an ldc but that doesn't change the number of steps.

MGoksu
  • 510
  • 6
  • 13
  • Can you explain in a bit more detail? It is still not clear to me. – SexyBeast Apr 27 '16 at 16:27
  • Okay, I got this part right, that the LCS of the string and its reverse has to be a palindrome. So for your example, the palindrome is `adda` of length 4. But then how are you inserting the remaining 4 characters into it to maintain the palindrome state? – SexyBeast Apr 27 '16 at 19:43
  • I edited that part but at that point you already have the answer `l-x` – MGoksu Apr 27 '16 at 21:08
  • I am sorry, but I still don't get it. What is it with the sequence of words you have given? It starts with `dzax` and `dcba`, which are the two halves of the string, that is all right. Then? You insert b,c,z and c in order. Who determines the order (I guess there isn't one)? And where do you insert it and why? – SexyBeast Apr 27 '16 at 22:07
  • it's ok. In the sequence every two is halves that we want to be equal for palindrome. And yes there is no order of insertion. Insertion is trivial here. In the example, `d` and `a` are two letters from lcs. What's between is z in one case and cb in another. We want them to be equal at the end. We have `d`z`a` and `d`bc`a` so we can arrive at `d`zbc`a`, `d`bzc`a`,`d`bcz`a`. All of them are ok. Then after `a`, we have c in one case and nothing in another so we just insert c and that makes `a`c. Finally we have `d`zbc`a`c(second half) and its reverse is c`a`cbz`d`(first half), new s1 of length 12 – MGoksu Apr 28 '16 at 07:53