Whoosh Concurrency Search

Question

I have an index (size = 3.5 GB) of 5 million small documents indexed using Whoosh.

As my documents have only name and content, therefore my Schema is very simple and has only two fields: id and content.

schema = Schema(name = ID(stored=True),
                content = TEXT(stored=True),
                )

To test performance, I'm using a set of 70,000 queries, but Whoosh is taking about 20 seconds to execute each one.

index = open_dir("../data/search/bm25_index/")
query_parser = QueryParser("content", schema=index.schema)
q = query_parser.parse("some query")
with index.searcher(weighting=scoring.TF_IDF()) as searcher:
    results = searcher.search(q)

Since the index is stateless, how could I perform a multi-thread search?

score 0 · Answer 1 · answered Sep 20 '20 at 08:26

0

You can use the multi-threading of python. Look at pool or process.

answered Sep 20 '20 at 08:26

user3070752

694
4
23

Whoosh Concurrency Search

1 Answers1