There basically two possible ways to achieve the use-case you are looking for.
Solution 1: Using wildcard query
Assuming that you have two fields
name
of type text
campus
of type text
Below is how your java code would be:
private static void wildcardQuery(RestHighLevelClient client, SearchSourceBuilder sourceBuilder)
throws IOException {
System.out.println("-----------------------------------------------------");
System.out.println("Wildcard Query");
MatchQueryBuilder campusClause_1 = QueryBuilders.matchQuery("campus", "oxford");
MatchQueryBuilder campusClause_2 = QueryBuilders.matchQuery("campus", "bradford");
//Using wildcard query
WildcardQueryBuilder nameClause = QueryBuilders.wildcardQuery("name", "nel*");
//Main Query
BoolQueryBuilder query = QueryBuilders.boolQuery()
.must(nameClause)
.should(campusClause_1)
.should(campusClause_2)
.minimumShouldMatch(1);
sourceBuilder.query(query);
SearchRequest searchRequest = new SearchRequest();
//specify your index name in the below parameter
searchRequest.indices("my_wildcard_index");
searchRequest.source(sourceBuilder);
SearchResponse searchResponse = client.search(searchRequest, RequestOptions.DEFAULT);
System.out.println(searchResponse.getHits().getTotalHits());
System.out.println("-----------------------------------------------------");
}
Note that if the fields of the above were of keyword
type and you need exact match for case sensitivity, you'd need the below code:
TermQueryBuilder campusClause_2 = QueryBuilders.termQuery("campus", "Bradford");
Solution 2. Using Edge Ngram tokenizer (Preferred Solution)
For this you would need to make use of Edge Ngram tokenizer.
Below is how your mapping would be:
Mapping:
PUT my_index
{
"settings": {
"analysis": {
"analyzer": {
"my_analyzer": {
"filter": "lowercase",
"tokenizer": "my_tokenizer"
}
},
"tokenizer": {
"my_tokenizer": {
"type": "edge_ngram",
"min_gram": 2,
"max_gram": 10,
"token_chars": [
"letter",
"digit"
]
}
}
}
},
"mappings": {
"properties": {
"name":{
"type": "text",
"analyzer": "my_analyzer"
},
"campus": {
"type": "text"
}
}
}
}
Sample Documents:
PUT my_index/_doc/1
{
"name": "Nelson Mandela",
"campus": "Bradford"
}
PUT my_index/_doc/2
{
"name": "Nel Chaz",
"campus": "Oxford"
}
Query DSL
POST my_index/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"name": "nel"
}
}
],
"should": [
{
"match": {
"campus": "bradford"
}
},
{
"match": {
"campus": "oxford"
}
}
],
"minimum_should_match": 1
}
}
}
Java Code:
private static void boolMatchQuery(RestHighLevelClient client, SearchSourceBuilder sourceBuilder)
throws IOException {
System.out.println("-----------------------------------------------------");
System.out.println("Bool Query");
MatchQueryBuilder campusClause_1 = QueryBuilders.matchQuery("campus", "oxford");
MatchQueryBuilder campusClause_2 = QueryBuilders.matchQuery("campus", "bradford");
//Plain old match query would suffice here
MatchQueryBuilder nameClause = QueryBuilders.matchQuery("name", "nel");
BoolQueryBuilder query = QueryBuilders.boolQuery()
.must(nameClause)
.should(campusClause_1)
.should(campusClause_2)
.minimumShouldMatch(1);
sourceBuilder.query(query);
SearchRequest searchRequest = new SearchRequest();
searchRequest.indices("my_index");
searchRequest.source(sourceBuilder);
SearchResponse searchResponse = client.search(searchRequest, RequestOptions.DEFAULT);
System.out.println(searchResponse.getHits().getTotalHits());
}
Note how I've just made use of match query for the name field. I'd suggest you read a bit about what analysis, analyzer, tokenizer and edge-ngram tokenizers are about.
In the console, you should be able to see the total hits of the document.
Similarly you can also make use of other query types for e.g. Term query
in the above solutions if you are looking for exact match for keyword
field etc.
Updated Answer:
Personally I do not recommend Solution 1
as it would be lot of computational power wastage for a single field itself, let alone for multiple fields.
In order to do multi-field sub-string matches, the best way to do that would be to make use of a concept called as copy-to
and then make use of Edge N-Gram tokenizer for that field.
So what does this Edge N-Gram tokenizer do really? Put it simply, based on min-gram
and max-gram
it would simply break down your tokens for e.g.
Zeppelin into Zep, Zepp, Zeppe, Zeppel, Zeppeli, Zeppelin
and thereby insert these values in the inverted index of that field. Not if you just execute a very simple match
query, it would return that document as your inverted index would have that substring.
And about copy_to field:
The copy_to
parameter allows you to copy the values of multiple fields
into a group field, which can then be queried as a single field.
Using copy_to field, we have the below mapping for the two fields campus
and name
.
Mapping:
PUT my_index
{
"settings": {
"analysis": {
"analyzer": {
"my_analyzer": {
"filter": "lowercase",
"tokenizer": "my_tokenizer"
}
},
"tokenizer": {
"my_tokenizer": {
"type": "edge_ngram",
"min_gram": 3,
"max_gram": 10,
"token_chars": [
"letter",
"digit"
]
}
}
}
},
"mappings": {
"properties": {
"name":{
"type": "text",
"copy_to": "search_string" <---- Note this
},
"campus": {
"type": "text",
"copy_to": "search_string" <---- Note this
},
"search_string": {
"type": "text",
"analyzer": "my_analyzer" <---- Note this
}
}
}
}
Notice in the above mapping, how I've made use of the Edge N-gram specific analyzer only to search_string
. Note that this consumes disk space as a result you may want to take a step back and make sure that you do not use this analyzer for all the fields but again it depends on the use-case that you have.
Sample Document:
POST my_index/_doc/1
{
"campus": "Cambridge University",
"name": "Ramanujan"
}
Search Query:
POST my_index/_search
{
"query": {
"match": {
"search_string": "ram"
}
}
}
And that would give you the Java Code as simple as below:
private static void boolMatchQuery(RestHighLevelClient client, SearchSourceBuilder sourceBuilder)
throws IOException {
System.out.println("-----------------------------------------------------");
System.out.println("Bool Query");
MatchQueryBuilder searchClause = QueryBuilders.matchQuery("search_string", "ram");
//Feel free to add multiple clauses
BoolQueryBuilder query = QueryBuilders.boolQuery()
.must(searchClause);
sourceBuilder.query(query);
SearchRequest searchRequest = new SearchRequest();
searchRequest.indices("my_index");
searchRequest.source(sourceBuilder);
SearchResponse searchResponse = client.search(searchRequest, RequestOptions.DEFAULT);
System.out.println(searchResponse.getHits().getTotalHits());
}
Hope that helps!