1

How can I modify the Smith-Waterman algorithm using a substitution matrix to align proteins in Perl?

[citations needed]

Brad Gilbert
  • 33,846
  • 11
  • 78
  • 129
may
  • 19
  • 2
  • 1
    How about providing links to a description of the algorithm, a definition of the substitution matrix and the definition of protein alignment. Otherwise, your question will be opaque to non-biologists (and even some biologists, I suspect). – Sinan Ünür Nov 09 '09 at 17:45
  • 3
    do you want to modify the algorithm or just use a different scoring matrix? If you're looking to modify the algorithm, you should already have a solid understanding of the math the way it stands and the math the way you want it to be. – dnagirl Nov 09 '09 at 17:52
  • I can't tell if this is a real question; I don't know enough of the field to know if molecular biologists mean specific things by aligning proteins and whether what a substitution matrix is well-defined. The link given to the algorithm doesn't describe the algorithm, but is rather a brief description of how to use it. – David Thornley Nov 09 '09 at 18:18

2 Answers2

5

I'm actually a bioinformatics researcher, and one that is waiting for his own bioinformatics code to run, so I'll attempt to answer your question even though it's rather poorly posed.

I'm not sure why you think you need to "modify" the Smith-Waterman algorithm. The only thing the Smith-Waterman algorithm needs to align proteins instead of DNA is a substitution matrix for proteins. Look into BLOSUM or PAM. These are based on the substitution frequencies of various amino acid pairs in sequences hand-aligned by some biologists a long time ago.

Constructing a substitution matrix for protein sequences is much more complicated than for DNA sequences. For example, you'd expect one hydrophilic amino acid to substitute for another relatively frequently because it would often be able to do so w/o causing the protein to lose function. However, you wouldn't expect a hydrophobic amino acid to substitute for a hydrophilic amino acid as often because this would change the protein structure more drastically.

If you view the substitution matrix as an input instead of part of the algorithm, the Smith-Waterman algorithm, while typically applied to DNA or proteins, is technically a general string alignment algorithm.

dsimcha
  • 67,514
  • 53
  • 213
  • 334
2

Maybe start with Bio::Tools::pSW, try to modify it the way you want and ask specific questions if you run in to difficulty.

Sinan Ünür
  • 116,958
  • 15
  • 196
  • 339