I stumbled upon an algorithmic problem which, in short, could be stated as follows:
We have n words as input. Try to compress the words in such a way to create one new, as short as possible, word consisting of all the "old" words as connected subwords (you can derive any of the "old" words by crossing out all the letter before and after some interval in the new word).
Example:
{aabb, bbcc, c} could be compressed as {aabbcc}
{a,a,a,a,a,a} could be compressed as {a}
The only idea I have is the dumbest brute force one could think of - we take the first word, check how long is the maximum common part of this word and every other by trying to match their starts and their ends, then connect the ones which gave the biggest overlap. Replace them both in our list with the newly created word. Repeat until we are left with one word.
The problem with this solution is not only that it's gonna be tragically slow but also it doesn't seem to be giving good answers in some cases. Say we have {aabb, bbaa, b} - it would connect the first two because their overlap is 2 and only 1 with the third one. Thus, we get {aabbaa, b} => {aabbaab} while we could have done {aabbaa}. A way to address it would be to take the percentage of a word a given overlap is apart from the bare length of a substring but I'm not sure if such tweaking of a faulty approach is a good idea…
What would you suggest for such problem to get the best results in the shortest time?