0

I am building a Java app that searches through data from Elasticsearch (Data comes in from kafka to logstash and then elasticsearch in json format). When I use QueryBuilders.queryStringQuery(reqId) I get all results back no problem but when I use QueryBuilders.termQuery("routingRequestID", reqId); I get 0 hits even though the reqId is present in ES data.


    RestHighLevelClient client = new RestHighLevelClient(
            RestClient.builder(new HttpHost("127.0.0.1", 9200, "http")));

    @GetMapping("/q/{reqId}")
    public String searchByReqId(@PathVariable("reqId") final String reqId) throws IOException {
        String[] indexes = {"devglan-log-test"};

        QueryBuilder queryBuilder = QueryBuilders.termQuery("routingRequestID", reqId);
        // QueryBuilder queryBuilder = QueryBuilders.queryStringQuery(reqId);

        SearchSourceBuilder searchSource = SearchSourceBuilder.searchSource().query(queryBuilder).from(0).size(1000);
        System.out.println(searchSource.query());

        SearchRequest searchRequest = new SearchRequest(indexes, searchSource);
        System.out.println(searchRequest.source().toString());

        SearchResponse searchResponse = client.search(searchRequest, RequestOptions.DEFAULT);
        System.out.println(searchResponse.toString());
        SearchHits hits = searchResponse.getHits();
        SearchHit[] searchHits = hits.getHits();
        for (SearchHit hit : searchHits) {
            System.out.println(hit.toString());
        }

        return "success";
    }
{
   took: 633,
   timed_out: false,
   _shards: {
      total: 1,
      successful: 1,
      skipped: 0,
      failed: 0
   },
   hits: {
      total: {
         value: 1,
         relation: "eq"
      },
      max_score: 1.6739764,
      hits: [
      {
         _index: "devglan-log-test",
         _type: "_doc",
         _id: "k4qAPXEBCzyTR4XVXPb2",
         _score: 1.6739764,
         _source: {
            @version: "1",
            message: "
                      {"requestorRole":"role3", "requestorGivenName":"doe", "requestorSurName":"male", 
                       "requestorOrganizationName":"dob", "reqd":"address", 
                       "requestorC":"city", "routingRequestID":"7778787898778879"}",
            @timestamp: "2020-04-03T00:45:53.917Z"
        }
      }
    ]
  }
}

Query generated by searchSource.query():

{
  "term" : {
    "routingRequestID" : {
      "value" : "2421",
      "boost" : 1.0
    }
  }
}

Query generated in searchRequest.source().toString():

{"from":0,"size":1000,"query":{"term":{"routingRequestID":{"value":"2421","boost":1.0}}}}

Results:

{"took":0,"timed_out":false,"_shards":{"total":1,"successful":1,"skipped":0,"failed":0},"hits":{"total":{"value":0,"relation":"eq"},"max_score":null,"hits":[]}}

All help is truly appreciated and please don't skip the post if you know how to help. *highfive emoji*

P.J.Meisch
  • 18,013
  • 6
  • 50
  • 66
Russ
  • 41
  • 1
  • 7
  • I've provided the answer, with the information you provided, hope it would solve your issue, otherwise please comment if you need more clarification with the required info I mentioned in first sentence of my answer. – Amit Apr 04 '20 at 03:08

4 Answers4

1

As you have not provided the mapping of your index, sample documents and expected documents for your search term. I am guessing based on whatever the information, is the issue with your routingRequestID and type of query which you are using.

Looks like routingRequestID is defined as text, which uses the standard analyzer by default and when you use the query string query, Elasticsearch applied the same analyzer which was used index time, as stated below in the same link:

The query then analyzes each split text independently before returning matching documents.

But when you use the termQuery as explained in term query doc, it's not analyzed and uses the same text, which is passed in the query:

Returns documents that contain an exact term in a provided field.

Solution:

Please try to use match query if you want from both queries the same result, as its analyzed query.

Amit
  • 30,756
  • 6
  • 57
  • 88
0

I think you should to check the data routingRequestID = 2421 exists.

//This queryBuilders like SQL: select * from XXX where routingRequestID=2421 limit 0,1000
{"from":0,"size":1000,"query":{"term":{"routingRequestID":{"value":"2421","boost":1.0}}}}
SuperPirate
  • 146
  • 1
  • 4
0

Your document does not have a field routingRequestId. It has a field message which contains the field routingRequestId.

So the query to build should be:

{
  "query": {
    "match": {
      "message.routingRequestId": "2421"
    }
  }
}
P.J.Meisch
  • 18,013
  • 6
  • 50
  • 66
0

So the problem was that all info was in one field. I solved the issue by changing logstash configurations and then using matchQuery. Here is what you need to add to your logstash config file if you are using kafka and json format:

input {
   kafka {
      bootstrap_servers => "kafka ip"
      topics => ["your kafka topics"]
   }
}
filter {
      json {
        source => "message"
      }
      mutate {
         remove_field => ["message"]
      }
    }

by the way I am using elasticsearch 7.4, latest logstash and latest kafka v. Best of luck and thanks to everyone who tried to help! I appreciate it! Here is the link for elasticsearch logstash plugin that will guide you through different options: https://www.elastic.co/guide/en/logstash/current/plugins-filters-json.html

P.J.Meisch
  • 18,013
  • 6
  • 50
  • 66
Russ
  • 41
  • 1
  • 7
  • this might solve your problem, but is not an answer to your original question. Because you were asking for the correct way to query your data, and now you match your data to fit the query. – P.J.Meisch Apr 07 '20 at 20:00
  • Yes it does solve but without this matchQuery doesn’t work. I was assuming the problem is in the query but the problem was in the mapping. Therefore if the mapping is not correct, using matchQuery will not help. Thanks for your help, I really appreciate it! – Russ Apr 07 '20 at 23:07