I am working on a natural language processing application. I have a text describing 30 domains. Each domain is defined with a short paragraph that explains it. My aim is to build a thesaurus from this text so I can determine from an input string
which domains are concerned. The text is about 5000 words and each domains is described by 150 words. My questions are :
Do I have a long enough text to create a thesaurus from ?
Is my idea of building a thesaurus legit or should I just use NLP libraries to analyse my corpus and the input string ?
At the moment, I have calculated the number total of occurrence of each words grouped by domains because I first thought of a indexed approach. But I am really not sure which method is the best. Does someone have experience in both NLP and thesaurus building ?