I have json
dumps from api which is in Hindi. First I got these json
files:
def retrieve_data():
'''Get articles from api and save locally.'''
i = 1
while True:
articles = get_articles(page_no=i)
if not articles or len(articles) < 1:
break
with open('dumps/%d.json' % i, 'w') as ijson:
json.dump(articles, ijson, ensure_ascii=False)
# I also tried
# json.dump(articles, ijson)
i = i+1
Now after indexing when I am searching for different hindi words i am getting mixed result.
for eg: नरेंद् मोदी
is a name of person which has many occurrence in my indexed articles. When I am searching नरेंद्
, I am easily getting all the matches but when I am searching for मोदी
, I am not able to get a single result. Same thing is happening with different hindi words for some I am getting results but for other I am not.
I am not able to understand what's wrong here. Because for english words my whoosh search is working perfectly and I am not facing a single problem there. That's why I suppose my whoosh codes are correct.