OK this is what I want to do:
Get more than two strings and "align" them (no DNA/RNA sequence or the like, just regular strings with not like 1000 items in each of them)
I've already done some work with pairwise alignment (align two strings) however the "gaps" create some issues for me when trying to align more than one pair.
Example (one I'm currently testing):
ABCDEF
ABGHCEEF
AJKLBCDYEOF
AB--CDEF
ABGHCEEF
=======================
AB--C-EF
A-B--C--E-F
AJKLBCDYEOF
=======================
A----C--E-F
And another (more illustrative) example :
http://nest.drkameleon.com
http://www.google.com
http://www.yahoo.com
http://nest.drkameleon.com
http://-www.--google--.com
=======================
http://----.------le--.com
http://----.------le--.com
http://-www.-----yahoo.com
=======================
http://----.----------.com
What I'm currently doing :
- Sort the strings (longer strings come first in the list)
- Align the first pair : A-B and get the result (let's say
R1
) - Then align the second pair :
R1
andC
(result inR2
) - Then align the third pair :
R2
andD
- And so on...
So what's in your mind? How could I go for that? Is there a better way? (Of course, there must be...)
I'd rather do that in Perl/Python or something along these lines, however any type of code/reference would be more than welcome! :-)