Updated on March 8th, 2021
Based on comments
Loading modules
from whoosh.index import create_in
from whoosh.fields import *
from whoosh.qparser import QueryParser
import pandas as pd
Defining search term
TERM = "second"
Creating indices for this example
schema = Schema(title=TEXT(stored=True), path=ID(stored=True), content=TEXT)
ix1 = create_in("indexdir1", schema)
schema = Schema(title=TEXT, path=ID(stored=True), content=TEXT(stored=True))
ix2 = create_in("indexdir2", schema)
writer1 = ix1.writer()
writer2 = ix2.writer()
for writer in [writer1, writer2]:
writer.add_document(title=u"First document", path=u"/a",
content=u"This is the first document we've added! Not the second")
writer.add_document(title=u"Second document", path=u"/b",
content=u"The second one is even more interesting than the first one!")
writer.add_document(title=u"Third document", path=u"/c",
content=u"You know... This is also different from the second one!")
writer.commit()
Searching by the term
results = []
parser = QueryParser("title", ix1.schema)
query = parser.parse(TERM)
results += list(ix1.searcher().search(query))
parser = QueryParser("content", ix1.schema)
query = parser.parse(TERM)
results += list(ix2.searcher().search(query))
So far your results are
print(results)
[<Hit {'path': '/b', 'title': 'Second document'}>, <Hit {'content': "This is the first document we've added! Not the second", 'path': '/a'}>, <Hit {'content': 'You know... This is also different from the second one!', 'path': '/c'}>, <Hit {'content': 'The second one is even more interesting than the first one!', 'path': '/b'}>]
Although the results are together, they are not ordered by anything.
Transforming it into a dictionary data structure
result = {"path": [], "title": [], "content": []}
fields = ["path", "title", "content"]
for dct in results:
for field in fields:
result[field].append(dct.get(field, None))
Creating a pandas dataframe with the results
df = pd.DataFrame(result)
print(df)
The dataframe is:
path title content
0 /b Second document None
1 /a None This is the first document we've added! Not th...
2 /c None You know... This is also different from the se...
3 /b None The second one is even more interesting than t...
Note Where you get None is because it doesn't match with the search
Grouping the results by "path" and counting results
groups = df.groupby(["path"]).count()
The groups
title content
path
/a 0 1
/b 1 1
/c 0 1
Creating a score column
groups["score"] = groups["title"] + groups["content"]
With the score column
title content score
path
/a 1 1 2
/b 0 1 1
/c 0 1 1
Sorting results by score
print(groups.sort_values("score", ascending=False))
title content score
path
/b 1 1 2
/a 0 1 1
/c 0 1 1
Note Although is in the same order as the one printed before it may not be the case in the real world
Finally, you can iterate through the dataframe and present your results.
End of update
Note You will find the first answer below. Specifically for the "at once" part. After some comments, I updated the post and decided to keep this here, because it might be useful.
Why don't use concurrent.futures
for that?
Starting
import concurrent.futures
from whoosh.filedb.filestore import FileStorage
storage = FileStorage("../indexdir")
ix_1 = storage.open_index(indexname='ind_1')
ix_2 = storage.open_index(indexname='ind_2')
ixs = [ix_1, ix_2]
TERM = "TERM TO SEARCH"
Defining search function
def search_things(ix, term):
with ix.searcher() as searcher:
query = QueryParser("content", ix.schema).parse(term)
results = searcher.search(query, terms=True)
return results
Parallelizing
# using two workers because there are two indices
with concurrent.futures.ThreadPoolExecutor(max_workers = 2) as executor:
future_to_search = {executor.submit(search_things, ix, TERM): ix for ix in ixs}
for future in concurrent.futures.as_completed(future_to_search):
s = future_to_search[future]
try:
data = future.result()
except Exception as exc:
print('%r generated an exception: %s' % (s, exc))
else:
print('Search (%r) finished => %d' % (s, data))
You might need to adapt it for your needs.