I'm trying to specify a Haystack pipeline using the YAML declarative syntax. I want to run a pipeline with two "lanes" whose answers will be merged - one using an EmbeddingRetriever
to fetch answers from a query embedding, and one using a (sparse) BM25Retriever. I want each retriever to use the same Elastic instance, accessed via two ElasticsearchDocumentStore
instances. Example:
components:
- name: DenseStore
type: ElasticsearchDocumentStore
params:
embedding_dim: 384 # This parameter is required for the embedding_model
index: dense_index
- name: SparseStore
type: ElasticsearchDocumentStore
params:
index: sparse_index`
At first I thought the problem was with trying to specify multiple DocumentStore
instances, but discovered this wasn't it. The problem seems to be that one must use the name DocumentStore
for a document store in the YAML file, which precludes specifying two DocumentStore
instances wrapping the same Elastic instance.
My first attempt was to build the pipeline in Python on Colab as described above, using two InMemoryDocumentStore
instances. This worked as expected. But when trying to move to a production setting I wanted to use the Haystack Docker image (run with an Elastic instance under Docker compose) and simply read in the YAML to specify the pipeline. When I did this, I would get an error that the Haystack DocumentStores
could not connect to Elastic on localhost:9200. Running a test with a simplified YAML pipeline using the name DocumentStore
for the document store component does connect successfully to Elastic. Obviously this isn't a solution because I want two DocumentStore
instances and they can't have the same name in the pipeline YAML.