I have to do some modification in Lucene code. I have to modify the code which actually matches the query term with the index to add a new scheme to matching.
Initially I wrote a code to search using PyLucene. You can find it here. From this it is clear that I have do modification in IndexSearcher class because this is the class which implements the actual search functionality.
Then I started following the code to go more deep to find the actual code to modify. Following is the call hierarchy I followed:
- In my code I am calling
searcher.search(query, None, 100)
on line 57. Then I followed it in IndexSearcher Code - First it is calling
public TopDocs search(Query query, Filter filter, int n)
on line 271 which is callingprotected TopDocs search(Weight weight, ScoreDoc after, int nDocs)
on line 428. - Now in this function I am assuming my code to be single threaded (as in SearchFiles.py I haven't mentioned to do multi-threaded search). So its calling
protected TopDocs search(List<AtomicReaderContext> leaves, Weight weight, ScoreDoc after, int nDocs)
on line 472. - This function is creating a collector object and calling
protected void search(List<AtomicReaderContext> leaves, Weight weight, Collector collector)
on line 599. - Now in this function there are two more relevant function calls:
Scorer scorer = weight.scorer(ctx, !collector.acceptsDocsOutOfOrder(), true, ctx.reader().getLiveDocs());
on line 613 andscorer.score(collector);
on line 616. One of this function is selecting the docs which matches to the query, that means, one of them is of my concern. - Then I followed both these functions but unable to find anything, explained below.
weight.scorer:
Initially when the Weight object is created in the function createNormalizedWeight(FilteredQuery)
called on the line 273, it created from the constructor of Weight
class itself, not from any child class. In the Weight.java class scorer()
function is an abstract function (line 113). Where this function is defined which is IndexSearcher is using?
scorer.score():
Scorer.java inherits DocsEnum.java which inherits DocIdSetIterator.java.
Scorer is using a function nextDoc()
and I think this is the function which is deciding the relevant doc (i.e. matching the doc to the query). But again this function is an abstract function declared in DocIdSetIterator
(line 92). So again, where this function is defined which is IndexSearcher is using?
As Lucene supports many searching models (Vector Space, Language, Okapi BM25) and supports multiple types of query. So it may be possible that selecting the type of model and query will define these functions according to the selection made. But in my code of SearchFiles.py I haven't selected any where the type of query and model I ma using. So Lucene should be making some default choices. But what I am unable to understand is where these default chices are made and how they are included in IndexSearcher code so that IndexSearcher is using these functions?