I have been working on a text analysis task. Where I am supposed to identify the words used most in a paragraph.
I am using algorithmia - npm package, for the purpose. This provides me the words repeated most in my text.
The package works quite fine, but still I have 2 issues:
(1) I am getting an array of tags like shown below:
['integrate', 'integration', 'policy', 'conversation', 'demo', 'test']
Here, 'integrate' & 'integration' both are having same meaning. I want to avoid 'integrate' over here.
(2) The process identifies tags using the words repeated the most. I have words like 'pricing', 'cost', 'payment' etc. in my input paragraph, but since it is not the exact match, I am not getting the tag 'cost' or something similar.
Improving either one of the logic will help me with the task.
I have already tried many libraries for synonyms, nouns, verbs, etc. But it does not seem to work out. Let mention the packages I have already tried:
thesaurus-com
sentence-similarity
string-similarity
compomise
wordnet
node-snowball
datamuse
I have also tried setting a threshold and match the words 'integrate' & 'integration', it does remove the 'integrate' tag, but also affect some of my other tags which needs to be there.
Thanks in advance