7

How can you detect / find out the meaning (the extension) of an acronym using NLP / Information Extraction (IE) methods?

We want to detect in free text if a word or it's acronym is used and map it to the same entity / token.

Most papers available online are about medical acronyms and they do not provide a library for acomplish this task.

Any ideas?

Thorsten Niehues
  • 13,712
  • 22
  • 78
  • 113
  • 4
    Acronyms are almost always domain dependent. That is why it is not a good idea to have a "general" library. NLP, for example, could mean 'natural language processing' or 'neuro-linguistic programming', depending on the domain. – Chthonic Project Nov 03 '14 at 18:11
  • Your question is not clear to me. You mean, given a word, you want to find its acronym? – Daniel Nov 04 '14 at 07:04
  • 1
    @Daniel yes I mean a mapping which maps the acronym to the extension – Thorsten Niehues Nov 04 '14 at 13:06
  • 1
    @ChthonicProject yes I understand that it is domain dependent. But how can I create such a (domain specific) mapping using NLP / text mining – Thorsten Niehues Nov 04 '14 at 13:08

2 Answers2

7

Reading your question and the comments I understand that you want to create a mapping from an acronym to its extension.

Assuming you have a collection of textual documents where both the acronym and its expansion occur you can apply an algorithm to extract (acronym,extension) pairs.

A Simple Algorithm for Identifying Abbreviation Definitions in Biomedical Text by A.S Schwartz and M.A. Hearst, does exactly this by looking at patterns. The Java implementation is available here.

I applied this algorithm to the English Wikipedia, you can see the results here. I also applied it to a collection of Portuguese new articles, results are here.

David Batista
  • 3,029
  • 2
  • 23
  • 42
0

Wordnet contains acronym for tons of words which you can use in variety of programming languages: http://wordnet.princeton.edu/wordnet/

Or get from Freebase. See this: What is one way to find related names using the web?

Community
  • 1
  • 1
Daniel
  • 5,839
  • 9
  • 46
  • 85
  • 1
    In my experience Wordnet contains very few acronyms. For example, UN, NATO and AA are there, but ISO, CERN, YMCA, CCTV, ... are not. – Ian Mercer Nov 04 '14 at 21:50