1

I have a working cURL request to search an Elastic Search index with an aggregation query As desired, the response includes a list of values for the specified aggregation field and the count of documents that match each of these values. For example, I am aggregating contacts by zip code and the response includes 50 zip codes with the number of contacts in each of these zip codes. Great.

Now I have also written a JAVA function that executes this same aggregation query. How can I parse out the data nested in the aggregation response? In particular, I would like to pull out the key and docCount variables of each bucket. I am having trouble finding examples of this online and in the Elastic documentation.

Here is what I have so far...

@GET
@Path("{indexName}")
public void searchResults(@PathParam("indexName") String indexName) throws IOException {
    RestHighLevelClient client = createHighLevelRestClient();
    int numberOfSearchHitsToReturn = 100; // defaults to 10

    SearchSourceBuilder sourceBuilder = new SearchSourceBuilder();

    sourceBuilder.size(numberOfSearchHitsToReturn);

    GlobalAggregationBuilder aggregation = AggregationBuilders.global("agg")
            .subAggregation(AggregationBuilders.terms("home_zip_aggregation").field("home_zip.keyword"));

    sourceBuilder.aggregation(aggregation);


    SearchRequest searchRequest = new SearchRequest(indexName).source(sourceBuilder);

    SearchResponse searchResponse = client.search(searchRequest, RequestOptions.DEFAULT);

    Aggregations aggregations = searchResponse.getAggregations();
    Terms byZipAggregation = aggregations.get("home_zip");
    System.out.print(byZipAggregation);
    System.out.print(searchResponse);

    client.close();

}

The searchResponse does indeed include a list of the aggregations. However, byZipAggregation is null. How can I fetch the home_zip aggregation data as an object? I am working with this Elastic documentation...

https://www.elastic.co/guide/en/elasticsearch/client/java-api/current/_bucket_aggregations.html

Here is the value of searchResponse:

{"took":7,"timed_out":false,"_shards":{"total":1,"successful":1,"skipped":0,"failed":0},"hits":{"total":{"value":51,"relation":"eq"},"max_score":1.0,"hits":[{"_index":"contacts_6_cjluhmdki6","_type":"_doc","_id":"2093","_score":1.0,"_source":{"list_id":"6","contact_id":"2093","firstname":"DANIEL","middlename":"C","lastname":"BRYANT","email":"","home_address1":"602  STONE CIRCLE CT APT 2","home_city":"SCHAUMBURG","home_state":"IL","home_zip":"60194","home_phone":"","latitude":"42.030346","longitude":"-88.06422","location_point":"0101000020E6100000F2EF332E1C0456C03FC8B260E2034540","date_of_birth":"10/26/1991","sex":"M","registered_party":"0","created":"2019-11-13 21:24:55.825672","imported":"2019-11-13 15:24:51.006805","fulltext":"'2':8 '60194':10 '602':3 'apt':7 'bryant':2 'circle':5 'ct':6 'daniel':1 'schaumburg':9 'stone':4","home_house_num":"602","home_street_name":"STONE CIRCLE","home_street_type":"CT","home_unit_num":"APT 2","fake_col":"0.414"} ... 

More document data was here. I removed it to abbreviate this example.

}}]},"aggregations":{"global#agg":{"doc_count":51,"sterms#home_zip_aggregation":{"doc_count_error_upper_bound":0,"sum_other_doc_count":38,"buckets":[{"key":"60462","doc_count":2},{"key":"60506","doc_count":2},{"key":"60005","doc_count":1},{"key":"60030","doc_count":1},{"key":"60061","doc_count":1},{"key":"60098","doc_count":1},{"key":"60102","doc_count":1},{"key":"60126","doc_count":1},{"key":"60137","doc_count":1},{"key":"60187","doc_count":1}]}}}}

I noticed it is possible to pass the entire Aggregations object to our client code, which is written in Javascript and then parse out the desired fields in the Javascript code. However, we'd like to do all of this parsing in the Java server code so that we don't pass unnecessary data to the client. Moreover, it appears some of the responses are too big for our server to pass to the client. So, how can I parse out the bucket keys and docCounts in Java?

GNG
  • 1,341
  • 2
  • 23
  • 50

0 Answers0