Efficiency of the ElasticSearch percolator versus querying

Question

I have job candidates and job listings. I am trying to determine which candidates are qualified for a particular listing. We already have every listing indexed in ES. The two ways that I see that I could do this are:

Index every candidate in ES, and then build a query based on the parameters of the listing to search/filter down on qualified candidates, and return those as results.
Use the percolate feature to create a percolate query for each candidate, and then find out which candidates match by running the listing's data against the candidate percolator index.

Which is more efficient and performant at scale (millions of records)? Not fully understanding how the percolator is implemented (I haven't found any articles that actually explain implementation), my concern is that using the percolator, I'd actually be running one query per candidate per listing, which would be very inefficient.

score 0 · Answer 1 · answered Nov 05 '15 at 16:38

With the Percolator, you're running a search query over a "queries" index. So in your case, the relative 'work' performed by Elasticsearch would be similar in both cases:

C: Number of Candidates CQ: Number of Candidate-Job-Search-Alert-Queries

(Based on your description, C = CQ in your system)

Option 1. Index all Candidates. Each time a new job is added, run a search over the candidates index for matching characteristics of the jobs. (Searches C records)

Option 2. Register 1 Job-Search-Alert-Query per Candidate in a .percolator index. Each time a new job is added, use the Percolate API to identify matching Candidate-Job-Search-Alert-Queries. (Searches CQ query records)

From a performance/scalability perspective, a bigger concern is that the Percolator requires the entire .percolator index to be loaded to memory.

From a functionality perspective, the Percolator limits certain query types which you may need (which would be a vote in favor of Option 1).

If you find yourself in a situation where CQ << C (e.g. user-saved searches), then it's more likely that the Percolator approach would outperform having to query the entire candidate index.

Efficiency of the ElasticSearch percolator versus querying

1 Answers1