How to use multiple document processor in vespa.ai in separate search chain?

Question

I need to use multiple document processor in my vespa use case. I have a condition where I need to modify feeds based on different conditions. I can not use document processor chaining. It has to be a separate one which I can use every time I insert a feed. I have tried using the below server.xml configuration.


    <document-processing>
                <chain id="foo">
                    <documentprocessor
                        id="com.abc.xyz.Test" bundle="abc-xyz-one" />
                </chain>
    <chain id="bar">
                    <documentprocessor
                        id="com.abc.xyz.Test2" bundle="abc-xyz-one" />
                </chain>
    </document-processing>

Request http://<IP>:<port>/document/v1/test2/test2/docid/<id>/;&chain=foo

Here I am getting a timeout.

score 2 · Accepted Answer · answered Jan 07 '20 at 14:42

2

To add multiple document processors, use

<document-processing>
  <chain id="default">
    <documentprocessor id="com.abc.xyz.Test" bundle="abc-xyz-one" />
    <documentprocessor id="com.abc.xyz.Test2" bundle="abc-xyz-one" />
  </chain>
</document-processing>

(I don't think you want to here but if you need multiple chains, you need to configure routing. This is because you usually want to route to processing chains depending on operation attributes, not leave it up to clients.)

answered Jan 07 '20 at 14:42

Jon

2,043
11
9

Could you please tell me how to configure routing and if I want to change the name of my document processor from 'default' to 'foo' and now how to use 'foo' as my document processor. – suyash308 Jan 08 '20 at 07:59
It's the *chain* that's named "default", or "foo" here. The chain can contain any number of document processors. I don't see a reason to care about the name of the chain, but if you really want this, see https://docs.vespa.ai/documentation/routing.html. – Jon Jan 09 '20 at 08:06
@Jon As you mentioned in your answer "you usually want to route to processing chains depending on operation attributes, not leave it up to clients" if, for a particular use case of "DocumentPut" operations on a particular namespace and document-type (e.g id:test:test::), I need to control when my processing runs during data insertion and when it doesn't. Can this be handled somehow in the default chain? – Vikrant Thakur Jan 09 '20 at 10:50
Write a document processor which rejects those operations. You can also declare a document selection expression for a content cluster such that only documents matching it will be routed to/retained in it https://docs.vespa.ai/documentation/reference/services-content.html#documents – Jon Jan 10 '20 at 11:32

score 1 · Answer 2 · answered Jan 08 '20 at 10:11

The /document/v1 http api (described here https://docs.vespa.ai/documentation/document-api.html) does not support a chain parameter. It has a 'route' parameter which allows you to send messages through a route where hops along the route can be docproc chains.

It'a unclear what you really want to do but you can route document operations to different document processing chain by having a route per chain (Then the condition which route to sent to needs to be resolved outside of Vespa), see https://docs.vespa.ai/documentation/routing.html. The vespa-route command utility is very handy to figure out what the hop names are.

How to use multiple document processor in vespa.ai in separate search chain?

2 Answers2