I am trying to determine the most popular keywords for certain class of documents in my collection. Assuming that the domain is "computer science" (which of course, includes networking, computer architecture, etc.) what is the best way to preserve these domain-specific keywords from text? I tried using Wordnet but I am not quite how to best use it to extract this information.
Are there any well-known list of words that I can use as a whitelist considering the fact that I am not aware of all domain-specific keywords beforehand? Or are there any good nlp/machine learning techniques to identity domain specific keywords?