I have an index (size = 3.5 GB) of 5 million small documents indexed using Whoosh.
As my documents have only name and content, therefore my Schema
is very simple and has only two fields: id
and content
.
schema = Schema(name = ID(stored=True),
content = TEXT(stored=True),
)
To test performance, I'm using a set of 70,000 queries, but Whoosh is taking about 20 seconds to execute each one.
index = open_dir("../data/search/bm25_index/")
query_parser = QueryParser("content", schema=index.schema)
q = query_parser.parse("some query")
with index.searcher(weighting=scoring.TF_IDF()) as searcher:
results = searcher.search(q)
Since the index is stateless, how could I perform a multi-thread search?