0

Background: Wikipedia page on Sequence Alignment says that DNA Sequence Alignment algorithms can also be used for Natural Language Processing.

Question: Because Named Entity Recognizer and DNA Sequence Libraries both do Approximate String Matching - is it practical to use a DNA Sequencing library (like Bowtie) and build your NER?

One reason to NOT use existing NER open sources but rather use a DNA Sequencing library to build NER is to hopefully get 'misspelling correction' automatically in my NER.

If my supposition above makes sense - is there some online DNA Sequencing tool where I can input my database of celebrity names rather than DNA sequences, and try to search a misspelling 'Michale Jacksun' in DNA Sequencing tool in a hope that it matches that with 'Michael Jackson' from input database

Tushar Goswami
  • 753
  • 1
  • 8
  • 19
  • 1
    Hi, I have been working on Named Entity Recognition for some time now. I did not know that Sequence Alignment is used for NER; it is definitely not a mainstream approach. If you are interested in only spelling variations, you are better off by using some edit distance tolerance measure to match tokens than Optimal Matching (Sequence Alignment based approach). – Vihari Piratla Dec 23 '15 at 06:51

1 Answers1

0

While DNA Sequencing also makes use of Edit Distance algorithms - the same algors you would use to detect mispelling during NER. But DNA Sequencing open sources are typically programmed to operate only on a few characters which are used to denote DNA sequences. They do not operate on normal a-z A-Z 0-9 range of ASCII characters. Citation : https://groups.google.com/forum/#!category-topic/nvbio-users/how-do-i--/ITjD6KPlEsc

So as Vihari also advised - its best to use some Edit Distance algo outrightly. But I really hope that NLP enthusiasts explore such DNA Sequencing open sources in coming times and evolve them to leverage the 'big data capacities' of such DNA sequencing open sources to bring that to us in NLP community

Tushar Goswami
  • 753
  • 1
  • 8
  • 19