0

I would like to know, how to use proximity search with the whoosh. I have read the documentation of the whoosh. It was written in the document that by using class whoosh.query.Phrase(fieldname, words, slop=1, boost=1.0, char_ranges=None) once can able to use the proximity search.

for example, I need to find "Hello World" in the index, but "Hello" should have 5-word distance from the word "World".

As of now, I am using the following code and its working fine with the normal parser.

from whoosh.query import *
from whoosh import qparser

index_path = "/home/abhi/Desktop/CLIR/indexdir_test"

ix = open_dir(index_path)

query='Hello World'

ana = StandardAnalyzer(stoplist=stop_word)


qp = QueryParser("content", schema=ix.schema,termclass=Phrase)
q=qp.parse(query)
with ix.searcher() as s:
   results = s.search(qp,limit=5)
   for result in results:
       print(result['content']+result['title'])
       print (result.score)
   print(len(results)) 

Guys, please help me how to use the class whoosh.query.Phrase(fieldname, words, slop=1, boost=1.0, char_ranges=None)' to use the proximity search and varies the distance between the words. Thanks in Advance

Abhishek Kaushik
  • 93
  • 1
  • 2
  • 12

1 Answers1

1

What you want is a slop factor of 5.

A few points:

  1. When you search, you must pass the query (q), not the query parser (qp): results = s.search(q, limit=5)

  2. limit refers to the maximum number of documents to return, not the slop factor. Your limit=5 parameter is saying you want to get up to 5 search results back (in case you were thinking this is the slop).

  3. You can remove termclass=Phrase

You can construct a phrase query two ways:

  1. Using a query string. Good for passing along a user query. Append ~ and the slop factor to the phrase for proximity search. If you want phrase terms to be up to 5 words apart: "hello world"~5
  2. Using a SpanNear2 query. Allows you to programmatically structure it the way you want. Pass all your phrase terms as an array of Term objects and specify slop as a constructor parameter.
from whoosh.query import spans

with ix.searcher() as s:

# Option 1: Query string
  query   = '"Hello World"~5'
  qp      = QueryParser("content", schema=ix.schema)
  q       = qp.parse(query)
  results = s.search(q, limit=5)

# Option 2: SpanNear2
  q = spans.SpanNear2([Term("content", "Hello"), Term("content", "world")], slop=5)
  results = s.search(q, limit=5)

Steven
  • 1,733
  • 2
  • 16
  • 30
  • Thanks Steven for your answer. I need a bit of your help in scoring functions. I need to use language modal for the whoosh scoring. May you please guide me. – Abhishek Kaushik Mar 10 '19 at 17:35
  • Sure, if this answered your original proximity question, kindly accept it as the answer and then you can post a new question or point me to an existing one and I'll see if I can help. – Steven Mar 10 '19 at 17:45
  • Here is the link to my question. https://stackoverflow.com/questions/47944961/language-modal-through-whoosh-in-information-retrieval – Abhishek Kaushik Mar 11 '19 at 14:27