What is the equivalent to a ElasticSearch Instance and Index in Vespa?

Question

Basic question for a Vespa newbie: In ElasticSearch I stand up a cluster of some number of nodes using best practices. I then create an Index for each of my tenants/clients, thus walling off my clients data from each other. I grow/shape the Elastic cluster as necessary to handle load.

What would be the analogous terms in Vespa parlance?

score 2 · Answer 1 · answered Dec 14 '21 at 07:23

2

In Vespa, a client will have its schema in an Application Package, and be deployed on a set of nodes. This gives total isolation.

It is possible to have multiple schemas in an application package, see https://docs.vespa.ai/en/schemas.html#multiple-schemas, running on the same set of nodes - or even on different nodes using different clusters. This can be useful for clients with many applications that are OK to co-locate on same node set.

Hence, many ways to set up, depending on how well you want to isolate the applications.

Further reading:

answered Dec 14 '21 at 07:23

Kristian Aune

876
5
5

Thank you for that great answer! I'm coming from an Elastic world. And I consider an ES Index level isolation to be analogous to a Database (vs schema) level isolation in PostgreSQL, for example. One ESIndex/Tenant or DB/Tenant. It's just tricky at this point in my learning curve on Vespa to grok that equivalent with this tool. For my use-case, where I'm hip deep in R&D and a multi-tenant product, I'd much prefer to be able to leverage a single well provisioned 'instance' of a tool shared by tenants vs. each tenant requiring their own instance. But that's not a deal breaker... – Gary Teichrow Dec 14 '21 at 19:14
right - in that case you can use a schema per tenant to start with, see Jon's comment on limitations – Kristian Aune Dec 16 '21 at 13:02

score 2 · Answer 2 · answered Dec 14 '21 at 09:38

The equivalent to different indexes in ES is to have different schemas: Each schema in a content cluster gets its own indexes and other data structures.

Note that (as in ES) this isn't perfect isolation:

It's on you to make sure each of your tenants can only query their own schema(s).
You don't have resource isolation so one tenant may consumer so much memory or cpu that it impacts others.
You're not providing each tenant with the full expressive power of an application package (which is much more than in ES), while retaining isolation.

You can achieve all of these of course by setting up different Vespa system instances for each tenant, at the cost of additional administration overhead and running different (e.g Docker) containers for each.

In principle you can also create a single system hosting multiple applications, where each application runs on different Docker containers and is therefore fully isolated. This is what Vespa Cloud does, but it's a lot more work and not documented.

Thank you for those great comments! – Gary Teichrow Dec 14 '21 at 19:22 — Gary Teichrow, Dec 14 '21 at 19:22

What is the equivalent to a ElasticSearch Instance and Index in Vespa?

2 Answers2