I have been going through many Libraries like whoosh/nltk and concepts like word net.
However I am unable to tackle my problem. I am not sure if I can find a library for this or I have to build this using the above mentioned resources.
Question: My scenario is that I have to search for key words. Say I have key words like 'Sales Document' / 'Purchase Documents' and have to search for them in a small 10-15 pages book.
The catch is: Now they can also be written as 'Sales should be documented' or 'company selling should be written in the text files'. (For Sales Document - Keyword) Is there an approach here or will I have to build something?
The code for the POS Tags is as follows. If no library is available I will have to proceed with this.
from nltk.tag import pos_tag
from nltk.tokenize import word_tokenize
from pandas import Series
import nltk
from nltk.corpus import wordnet
def tag(x):
return pos_tag(word_tokenize(x))
synonyms = []
antonyms = []
for syn in wordnet.synsets("Sales document"):
#print("Down2")
print (syn)
#print("Down")
for l in syn.lemmas():
print(" \n")
print(l)
synonyms.append(l.name())
if l.antonyms():
antonyms.append(l.antonyms()[0].name())
print(set(synonyms))
print(set(antonyms))
for i in synonyms:
print(tag(i))
Update: We went ahead and made a python program - Feel free to fork it. (Pun intended) Further the Git Dhund is very untidy right now will clean it once completed. Currently it is still in a development phase.
The is the link.