I'd like to implement a keyword blacklist using ElasticSearch. Basically I want to create a list of banned queries that a user is not allowed to search for. Then I want to be able to pass in a checked query and see which banned queries it matches (if any).
A checked query matches a banned query if the banned query has a subset of its keywords. To illustrate, let me provide an example:
- Banned Queries:
- "black lives"
- "black lives matter"
- "black lives matters"
- "black lives matter rulez"
- Checked Query: "black lives matter"
- Matches:
- "black lives"
- "black lives matter"
Only the first two banned queries match, because they're strict subsets of the checked query. The third banned query doesn't match because it uses "matters", not "matter". The last banned query doesn't match because it isn't a strict subset of "black lives matter", because it has an additional keyword "rulez".
I've been told that the best way to implement this is a percolate index. My question is how do I create a percolate query that implements a subset match against a checked query (the incoming document)?
Here is the documentation page about percolate queries: https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-percolate-query.html
Here is a related answer about subset matching: https://discuss.elastic.co/t/subset-in-an-array/237459