1

I'm trying to deploy a Vespa cluster of 10 physical machines, each one containing 5 content nodes. The redundancy is set to 2 and I don't want the data and its replica on the same physical machine. So I created groups on content nodes representing the physical machines they are in and set the repartion policy to: 1|*.

Here is my groups configuration:

<group name="top-group" distribution-key="0">
    <distribution partitions="1|*"/>
    <group name="machine1" distribution-key="1">
        <node hostalias="content11" distribution-key="11"/>
        <node hostalias="content12" distribution-key="12"/>
        <node hostalias="content13" distribution-key="13"/>
        <node hostalias="content14" distribution-key="14"/>
        <node hostalias="content15" distribution-key="15"/>
    </group>
    <group name="machine2" distribution-key="2">
        <node hostalias="content21" distribution-key="21"/>
        <node hostalias="content22" distribution-key="22"/>
        <node hostalias="content23" distribution-key="23"/>
        <node hostalias="content24" distribution-key="24"/>
        <node hostalias="content25" distribution-key="25"/>
    </group>
    <group name="machine3" distribution-key="3">
        <node hostalias="content31" distribution-key="31"/>
        <node hostalias="content32" distribution-key="32"/>
        <node hostalias="content33" distribution-key="33"/>
        <node hostalias="content34" distribution-key="34"/>
        <node hostalias="content35" distribution-key="35"/>
    </group>
    <group name="machine4" distribution-key="4">
        <node hostalias="content41" distribution-key="41"/>
        <node hostalias="content42" distribution-key="42"/>
        <node hostalias="content43" distribution-key="43"/>
        <node hostalias="content44" distribution-key="44"/>
        <node hostalias="content45" distribution-key="45"/>
    </group>
    <group name="machine5" distribution-key="5">
        <node hostalias="content51" distribution-key="51"/>
        <node hostalias="content52" distribution-key="52"/>
        <node hostalias="content53" distribution-key="53"/>
        <node hostalias="content54" distribution-key="54"/>
        <node hostalias="content55" distribution-key="55"/>
    </group>
    <group name="machine6" distribution-key="6">
        <node hostalias="content61" distribution-key="61"/>
        <node hostalias="content62" distribution-key="62"/>
        <node hostalias="content63" distribution-key="63"/>
        <node hostalias="content64" distribution-key="64"/>
        <node hostalias="content65" distribution-key="65"/>
    </group>
    <group name="machine7" distribution-key="7">
        <node hostalias="content71" distribution-key="71"/>
        <node hostalias="content72" distribution-key="72"/>
        <node hostalias="content73" distribution-key="73"/>
        <node hostalias="content74" distribution-key="74"/>
        <node hostalias="content75" distribution-key="75"/>
    </group>
    <group name="machine8" distribution-key="8">
        <node hostalias="content81" distribution-key="81"/>
        <node hostalias="content82" distribution-key="82"/>
        <node hostalias="content83" distribution-key="83"/>
        <node hostalias="content84" distribution-key="84"/>
        <node hostalias="content85" distribution-key="85"/>
    </group>
    <group name="machine9" distribution-key="9">
        <node hostalias="content91" distribution-key="91"/>
        <node hostalias="content92" distribution-key="92"/>
        <node hostalias="content93" distribution-key="93"/>
        <node hostalias="content94" distribution-key="94"/>
        <node hostalias="content95" distribution-key="95"/>
    </group>
    <group name="machine10" distribution-key="10">
        <node hostalias="content101" distribution-key="101"/>
        <node hostalias="content102" distribution-key="102"/>
        <node hostalias="content103" distribution-key="103"/>
        <node hostalias="content104" distribution-key="104"/>
        <node hostalias="content105" distribution-key="105"/>
    </group>
</group>

And when I try to deploy my application I face this error:

Request failed. HTTP status code: 400
Invalid application package: default.default: Error loading model: In indexed content cluster 'site' using hierarchic distribution: Expected number of leaf groups (10) to be a factor of redundancy (2), but it is not.

Which I do not understand. What should I change in my configuration to sort this out?

dkurzaj
  • 346
  • 4
  • 13

1 Answers1

1

The documentation on Document Distribution is missing an important limitation. The use-case you describe is not supported for the mode="index", only for mode="streaming" (Streaming Search) and mode="store-only".

When using mode="index" the search will be routed to a single group, allowing increased throughput. That means that you have to have one copy of each document in each group.

With 10 physical machines you have to have one content instance on each machine to ensure that a replica is stored on a different machine. This means that you should not use hierarchical distribution and several content instances on each physical machine.

  • Ok, thank you. So there is no way I can have a redundancy of 2, on 5 physical machines, and ensure that the document is stored on a separate machine from its replica? I would need to have a redundancy of at least 5 (and other multiples of 5), and define my 5 machines as groups, to ensure that replica is not stored on the same machine? Or maybe Vespa automatically tries to separate the document from its replica, even though they are in the same group? – dkurzaj Sep 27 '18 at 12:56
  • Why do you want to use hierarchical distribution? In search it is used to scale an application to handle a higher query load, a query can be served by any group in the hierarchical distribution. – Bjørn Meland Sep 27 '18 at 13:29
  • Or rather, why do you want to split your physical nodes into 5 content nodes? The content nodes are C++ so the problem with big Java processes is not relevant. I suggest creating 10 content nodes which are 1-1 with physical nodes and no grouping. – Jon Sep 27 '18 at 18:07
  • Oh ok, actually I had run some tests, and during those, I had better performances with 5 content nodes on one machine, than only one. But now I realize that my results might have been biased because I had the feed concurrency setting (https://docs.vespa.ai/documentation/content/setup-proton-tuning.html#feeding-concurrency) set to 1. So maybe that's why the search performances were not as good with only one content node per machine... – dkurzaj Sep 28 '18 at 12:37