Hi
I have two sequence of numerical data let's say :
S1 : 1,6,4,9,8,7,5 and S2 : 6,9,7,5
And i'd like to find a sequence alignment in both sense left-right and right-left.
So i used 2 techniques before asking i actually used the hungarian algorithm but it's not sequencial so it doesn't give good results
And i used a modified version of the Needleman–Wunsch algorithm but i think i'm maybe doing it wrong or something and i've been digging for at least 4 months for anything that could help me but i only find genetic algorithms which may be helpful but i was wondering if there's a algorithm that exists that i may haven't seen yet ?
So to formalise my question : How would you align two positive numerical (integer or double) sequences ?
Asked
Active
Viewed 575 times
1

Chakib Mataoui
- 21
- 3
-
What would be the expected output for the 2 sequences shown in your question? And what's the explanation for how you got that output? – Bernhard Barker May 14 '17 at 22:34
-
Shouldn't you maybe try to figure out what's wrong with the code you've written thus far (through debugging) instead of scrapping it and asking for a different approach? – Bernhard Barker May 14 '17 at 22:41
-
@Dukeling let's say it would be an array in which we would have the elements that matches the best (not a real perfect match) but at least the nearest elements possible so this example is random and it would give a perfect match like ([2,1],[4,2],[6,3],[7,4]) the left element being the index of the S1 sequence matching the right element which is the index of the S2 from that i would calculate a distance between those two sequences which would actually be the sum of the non matched elements + the sum of the absolute substratcion of the matched elements – Chakib Mataoui May 14 '17 at 22:42
-
@Dukeling i wouldn't be asking for another approach if i wasn't judging the method which is not working wrong sir but i'm not the one who proposed to use it and I've been working very hard to find another approach which i couldn't find so that's why i'm hoping to find people with knowledge i couldn't have access to maybe helping even giving me just a hint so i could go on and find a solution – Chakib Mataoui May 14 '17 at 22:45
-
That sounds pretty similar to [the longest common subsequence problem](https://en.wikipedia.org/wiki/Longest_common_subsequence_problem). You'd need to concretely and exactly define how you'd compare non-exact matches to determine the best one if you want help with that part. – Bernhard Barker May 14 '17 at 22:51
-
Thank you very much @Dukeling but i'm actually aware of what kind of problem i'm dealing with the only thing is that the solutions for the longest common subsequence are made for strings and are used particularly in genetic's algorithms but what about other data which in my case are numbers do you have any idea ? – Chakib Mataoui May 14 '17 at 23:01
-
@Chekbo numbers are no problem. Strings are character sequences, you are dealing with number sequences. All what matters is if two values are the same or not. Or do you have a problem with precision? Then you could maybe allow a small relative error for two numbers to still be considered the same. – maraca May 15 '17 at 13:58
-
@maraca thank you but Well numbers may be are decimal values i generalized the problem to find an alternative in practice i'm using this to calculate a distance between two sequence of areas of parts of a shape and i need that distance between them when there's not the same number of parts between them – Chakib Mataoui May 15 '17 at 14:15
1 Answers
1
I believe you can accomplish your objective with the following:
import string
from Bio import pairwise2
from Bio.pairwise2 import format_alignment
seq1 = "1649875"
seq2 = "6975"
numDict = {}
for x in range(0,10):
for y in range(0,10):
numDict[(str(x),str(y))] = -abs(x-y)
#print(numDict)
for a in pairwise2.align.globalds(seq1, seq2, numDict, -3, -1):
print(format_alignment(*a)) #prints alignment with best score
#for a in pairwise2.align.globalms(seq1, seq2, 5, -5, -3, -1):
print(format_alignment(*a))
The globalds alignment allows you to use a custom dictionary (in this case, I created a dictionary containing numbers ranging from 1-9 and found the absolute value of their difference when paired). If you just want a flat yes/no scoring system, you could do something like globalms, where a success is +5 and a failure is -5. Note, I advise using gap penalties when performing alignments. Also familiarize yourself with 'global' and 'local' alignments. More information on the Pairwise2 biopython module can be found here: http://biopython.org/DIST/docs/api/Bio.pairwise2-module.html

Ghoti
- 737
- 4
- 19
-
2Thank you for your answer but i was actually looking for an alignment even if the numbers are different because the sequence is actually a float sequence but i wanted a more global solution and i actually found it i just forgot to share it so i used the [Wunsch Needleman Algorithm](https://en.wikipedia.org/wiki/Needleman%E2%80%93Wunsch_algorithm) which i adapted to my specific problem and to compare actually the data i just did it naturally with absolute values and a gap of max(seq) + 1 i don't know why the +1 it just worked that way for me i'll have to search further on how to improve it. – Chakib Mataoui Aug 05 '17 at 18:30
-