0

I'm trying to specify a Haystack pipeline using the YAML declarative syntax. I want to run a pipeline with two "lanes" whose answers will be merged - one using an EmbeddingRetriever to fetch answers from a query embedding, and one using a (sparse) BM25Retriever. I want each retriever to use the same Elastic instance, accessed via two ElasticsearchDocumentStore instances. Example:

components:
  - name: DenseStore
    type: ElasticsearchDocumentStore 
    params: 
      embedding_dim: 384 # This parameter is required for the embedding_model
      index: dense_index
  - name: SparseStore
    type: ElasticsearchDocumentStore 
    params: 
      index: sparse_index`

At first I thought the problem was with trying to specify multiple DocumentStore instances, but discovered this wasn't it. The problem seems to be that one must use the name DocumentStore for a document store in the YAML file, which precludes specifying two DocumentStore instances wrapping the same Elastic instance.

My first attempt was to build the pipeline in Python on Colab as described above, using two InMemoryDocumentStore instances. This worked as expected. But when trying to move to a production setting I wanted to use the Haystack Docker image (run with an Elastic instance under Docker compose) and simply read in the YAML to specify the pipeline. When I did this, I would get an error that the Haystack DocumentStores could not connect to Elastic on localhost:9200. Running a test with a simplified YAML pipeline using the name DocumentStore for the document store component does connect successfully to Elastic. Obviously this isn't a solution because I want two DocumentStore instances and they can't have the same name in the pipeline YAML.

eglease
  • 2,445
  • 11
  • 18
  • 28

0 Answers0