1

I have json dumps from api which is in Hindi. First I got these json files:

def retrieve_data():
    '''Get articles from api and save locally.'''
    i = 1
    while True:
        articles = get_articles(page_no=i)
        if not articles or len(articles) < 1:
            break
        with open('dumps/%d.json' % i, 'w') as ijson:
            json.dump(articles, ijson, ensure_ascii=False)
            # I also tried
            # json.dump(articles, ijson)
        i = i+1

Now after indexing when I am searching for different hindi words i am getting mixed result.

for eg: नरेंद् मोदी is a name of person which has many occurrence in my indexed articles. When I am searching नरेंद्, I am easily getting all the matches but when I am searching for मोदी, I am not able to get a single result. Same thing is happening with different hindi words for some I am getting results but for other I am not.

I am not able to understand what's wrong here. Because for english words my whoosh search is working perfectly and I am not facing a single problem there. That's why I suppose my whoosh codes are correct.

Rahul Shrivastava
  • 1,391
  • 3
  • 14
  • 38
  • 1
    It works for English because the analyzer and tokenizer works for English. Could you provide the code that calls the whoosh APIs , to index the documents. – algrebe Apr 09 '16 at 10:13

0 Answers0