What might cause a memory error in haystack search when tried in python flask application?

Question

I have indexed around 1000 document in elastic search. When I try to query with haystack search it returns files as output but after continuously using for 5 times memory error occurs. and the execution of program stops. I have attached the code that have used here.

document_store = ElasticsearchDocumentStore(host="localhost", username="", password="", index="document")

json_object = open("doc_json_file.json")
data_json = json.load(json_object)
json_object.close()
document_store.write_documents(data_json)

retriever = TfidfRetriever(document_store=document_store)

reader = FARMReader(model_name_or_path="deepset/roberta-base-squad2", use_gpu=True)

pipe = ExtractiveQAPipeline(reader, retriever)

prediction = pipe.run(query=str(query), params={"Retriever": {"top_k": 20}, "Reader": {"top_k": 20}})

return prediction

Filename and file content are stored in a json file. Below one is the error log

OSError: [WinError 1455] The paging file is too small for this operation to complete
from .netcdf import netcdf_file, netcdf_variable
File "<frozen importlib._bootstrap>", line 983, in _find_and_load
File "<frozen importlib._bootstrap>", line 967, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 677, in _load_unlocked
File "<frozen importlib._bootstrap_external>", line 724, in exec_module
File "<frozen importlib._bootstrap_external>", line 818, in get_code
File "<frozen importlib._bootstrap_external>", line 917, in get_data
MemoryError
from pandas._libs.interval import Interval
ImportError: DLL load failed: The paging file is too small for this operation to complete.

Could you, please, share a more complete version of the code that would show how you use the code in flask? Can it be that you re-create the document store and upload the documents on each API call? — dmigo, Jul 22 '22 at 08:49

score 1 · Answer 1 · edited Jul 22 '22 at 13:04

If you're using ElasticsearchDocumentStore, better use BM25Retriever instead. TfidfRetriever is a simpler version that does not require an inverted index database like Elasticsearch.

As a downside it has to keep any index data in memory which can cause very high memory pressure. BM25Retriever in combination with ElasticsearchDocumentStore uses almost the same (but slightly superior) retrieval model and won't have that issue.

What might cause a memory error in haystack search when tried in python flask application?

1 Answers1