Given a large corpus of indexed documents with Whoosh I am trying to retrieve the titles (indexed field) with the associated document numbers.
How can I retrieve both document number and titles itemwise from the index?
Background: I indexed my corpus from a pandas df like this:
schema = Schema(content=TEXT(stored=True),
abstract=TEXT(stored=True),
title=TEXT(stored=True)) # create whoosh scheme
if not os.path.exists("indexdir"):
os.mkdir("indexdir") # create index loc
ix = index.create_in("indexdir", schema) # create index
ix = index.open_dir("indexdir")
writer = ix.writer() # writerfunction
for index, row in df.iterrows(): #index preprocessed columns from df
writer.add_document(title=row["new_title"], content=row["new_content"], abstract=row["new_abstract"]) # index documents
writer.commit() # end indexing and close