3

Suppose you have two strings. Each string has lines, seperated by a newline character. Now you want to compare both strings and then find the best method (shortest number of steps) by only adding or deleting lines of one string, to transform the second string in to the first string.

i.e.

string #2:

abc
def
efg
hello
123

and string #1:

abc
def
efg
adc
123

The best (shortest steps) solution to transform string #2 in to string #1 would be:

  1. remove line at line position 3 ('hello')
  2. add 'abc' after line position 3

How would one write a generic algorithm to find the quickest, least steps, solutions for transforming one string to another, given that you can only add or remove lines?

  • Is this for a test of something like that? Depending on what C feature you are allowed to use the solution can vary. And when you say "remove line at line position 3 ('hello') " it feels like you are talking of an array of string. If you just have a string, "hello" begins at index 12 of the string #2. – aurelienC Dec 11 '15 at 10:19

3 Answers3

4

This is a classic problem.

For a given set of allowed operations the edit distance between two strings is the minimal number of operations required to transform one into the other.

When the set of allowed operations consists of insertion and deletion only, it is known as the longest common subsequence edit distance.

You'll find everything you need to compute this distance in Longest common subsequence problem.

0

Note that to answer this question fully, one would have to thoroughly cover the huge subject of graph similarity search / graph edit distance, which I will not do here. I will, however, point you in directions where you can study the problem more thoroughly on your own.


... to find the quickest, least steps, solutions for transforming one string to another ...

This is a quite common problem known as the (minimum) edit distance problem (or, originally, the specific 'The String-to-String Correction problem', by R. Wagner and M. Fischer), which is a non-trivial problem for the optimal (minimum = least steps) edit distance, which is what you ask for in your question.

See e.g.:

https://en.wikipedia.org/wiki/Edit_distance

https://web.stanford.edu/class/cs124/lec/med.pdf

The minimum edit distance problem for string similarity is in itself a subclass of the more general minimum graph edit distance problem, or graph similarity search (since any string or even sequenced object, as you have noted yourself, can be represented as a graph), see e.g. A survey on graph edit distance.

For details regarding this problem here on SO, refer to e.g. Edit Distance Algorithm and Faster edit distance algorithm.

This should get you started.

I'd tag this problem rather as a math problem (algorithmic instructions) rather than language specific problems, unless someone could guide you to an existing language (C) library for solving edit distance problems.

Community
  • 1
  • 1
dfrib
  • 70,367
  • 12
  • 127
  • 192
0

The fastest way would be to remove all sub-strings, then append (not insert) all new sub-strings; and to do "all sub-strings at once" if you can (possibly leading to a destPointer = sourcePointer approach).

The overhead of minimising the amount of sub-strings removed and inserted will be higher than removing and inserting/appending without checking if its necessary. It's like spending $100 to pay a consultant to determine if you should spend $5.

Brendan
  • 35,656
  • 2
  • 39
  • 66