1

I have an ES domain, from which when I query with the document's emailId field, I'm not getting any hits. However this field and value exist for a document. For the same document, querying by employeeId works. Below is how my index mapping looks like.

{
  "properties": {
    "employeeId": {
      "type": "text",
      "fields": {
        "keyword": {
          "ignore_above": 256,
          "type": "keyword"
        }
      }
    },
    "emailId": {
      "type": "text",
      "fields": {
        "keyword": {
          "ignore_above": 256,
          "type": "keyword"
        }
      }
    }
  }
}

Below is how I'm running the search.

public SearchResponse searchForExactDocument(final String indexName, final Map<String, Object> queryMap)
            throws IOException {
        BoolQueryBuilder queryBuilder = QueryBuilders.boolQuery();
        queryMap.forEach((name, value) -> {
            queryBuilder.must(QueryBuilders.termQuery(name, value));
        });
        return this.executeSearch(indexName, queryBuilder);
    }

private SearchResponse executeSearch(final String indexName, final QueryBuilder queryBuilder) throws IOException {
        SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
        searchSourceBuilder.query(queryBuilder);
        SearchRequest searchRequest = new SearchRequest();
        searchRequest.indices(indexName);
        searchRequest.source(searchSourceBuilder);
        return restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT);
    }

I ran the SearcRequest.source().toString() and below is the source string for the search I get.

{
  "query": {
    "bool": {
      "must": [
        {
          "term": {
            "emailId": {
              "value": "21june6lambdatest7@gmail.com",
              "boost": 1.0
            }
          }
        }
      ],
      "adjust_pure_negative": true,
      "boost": 1.0
    }
  }
}

below is the document that should get returned, but not getting any hits.

index{
  [
    person
  ][
    _doc
  ][
    null
  ],
  source[
    {
      "firstName": "MyEmployee",
      "lastName": "June6Test7",
      "emailId": "21june6lambdatest7@gmail.com",
      "employeeId": "13908528"
    }
  ]
}

I'm finding it very weird that query with employeeId works fine but emailId won't work. Any help would be much appreciated.

UPDATE: Following is my index creating method.

public CreateIndexResponse createIndex(final CreateIndexInput createIndexInput) throws IOException {
        CreateIndexRequest createIndexRequest = new CreateIndexRequest(createIndexInput.indexName());
        Settings.Builder settingsBuilder = Settings.builder();
        settingsBuilder.put(NUMBER_OF_SHARDS_KEY, createIndexInput.numOfShards());
        settingsBuilder.put(NUMBER_OF_REPLICAS, createIndexInput.numOfReplicas());
        settingsBuilder.put("analysis.analyzer.custom_uax_url_email.tokenizer", "uax_url_email");
        createIndexInput.mapping().ifPresent(mapping ->
                createIndexRequest.mapping(mapping, XContentType.JSON));
        createIndexRequest.settings(settingsBuilder.build());
        return restHighLevelClient.indices().create(createIndexRequest, RequestOptions.DEFAULT);
    }
AnOldSoul
  • 4,017
  • 12
  • 57
  • 118

1 Answers1

2

Term query returns documents that contain an exact term in a provided field. You need to add .keyword to the emailId field. This uses the keyword analyzer instead of the standard analyzer (notice the ".keyword" after the emailId field).

By default text type field uses a standard analyzer if no analyzer is specified. This will break "21june6lambdatest7@gmail.com" into the following tokens

{
  "tokens": [
    {
      "token": "21june6lambdatest7",
      "start_offset": 0,
      "end_offset": 18,
      "type": "<ALPHANUM>",
      "position": 0
    },
    {
      "token": "gmail.com",
      "start_offset": 19,
      "end_offset": 28,
      "type": "<ALPHANUM>",
      "position": 1
    }
  ]
}

You need to modify your query as

{
  "query": {
    "bool": {
      "must": [
        {
          "term": {
            "emailId.keyword": {                // note this
              "value": "21june6lambdatest7@gmail.com",
              "boost": 1.0
            }
          }
        }
      ],
      "adjust_pure_negative": true,
      "boost": 1.0
    }
  }
}

Update 1: Based on the comments below, modify your index mapping and settings as

{
  "settings": {
    "analysis": {
      "analyzer": {
        "my_analyzer": {
          "tokenizer": "my_tokenizer"
        }
      },
      "tokenizer": {
        "my_tokenizer": {
          "type": "uax_url_email"
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "emailId": {
        "type": "text",
        "analyzer":"my_analyzer"
      }
    }
  }
}

Search Query:

{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "emailId": "21june6lambdatest7@gmail.com"
          }
        }
      ],
      "adjust_pure_negative": true,
      "boost": 1.0
    }
  }
}

Search Result:

 "hits": [
      {
        "_index": "67823510",
        "_type": "_doc",
        "_id": "1",
        "_score": 0.6931471,
        "_source": {
          "emailId": "21june6lambdatest7@gmail.com"
        }
      }
    ]
ESCoder
  • 15,431
  • 2
  • 19
  • 42
  • Thank you very much for the help with this. So to do this from RestHighLevelClient Java library, would it make sense to do something like this? `Settings settings = Settings.builder() .put("analysis.analyzer.custom_uax_url_email.type", "custom") .put("analysis.analyzer.custom_uax_url_email.tokenizer", "uax_url_email") .build();` – AnOldSoul Jun 07 '21 at 04:41
  • or is there a better way to do that? – AnOldSoul Jun 07 '21 at 04:41
  • @AnOldSoul when you are querying on email id, it is always better to use UAX URL email tokenizer --> https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-uaxurlemail-tokenizer.html Shall I provide you a working example of the same ? – ESCoder Jun 07 '21 at 04:46
  • Got it, could you point me in the direction of I could use UAX URL email tokenizer with my Java method for search? – AnOldSoul Jun 07 '21 at 04:48
  • @AnOldSoul you need to define an analyzer(with this tokenizer) in your index settings, and then use this defined analyzer in your index mapping for the `emailId` field. Please go through the updated part of the answer, and let me know if this resolves your issue ? – ESCoder Jun 07 '21 at 04:57
  • I updated the question with how I'm creating the "person" index now. However the search method is still failing to find the email. I do not pass the mapping explicitly, its created when I index the 1st document. Should I create the mapping explicitly as well? :( – AnOldSoul Jun 07 '21 at 05:37
  • @AnOldSoul if you have not passed the index mapping explicitly, then I think using `.keyword` will do the trick. Have you tried this as given in the first part of the answer ? – ESCoder Jun 07 '21 at 06:41
  • I'm struggling to figure out how I could pass `.keyword` to the source because its being generated through the Java method `executeSearch()` above in the question. There doesn't seem to be any documentation on how I could update SearchRequest to include this `.keyword` – AnOldSoul Jun 07 '21 at 06:59
  • @AnOldSoul can you please accept the answer, if it helped you resolve your issue :-) – ESCoder Jun 11 '21 at 02:13
  • Sorry I'm still struggling to figure out how I could use "keyword" with the RestHighLevel Client Java implementation. Is this something you might know? – AnOldSoul Jun 11 '21 at 18:13