9

Am trying to fetch the indexed PDF documents from my index (ElasticSearch). I have indexed my pdf documents using ingest-attachment processor plugin. Totally its 2500 documents has been indexed along with PDF attachment.

Now am fetching those PDF by searching with the contents of the PDF and am gettig the below error.

org.apache.http.ContentTooLongException: entity content is too long [105539255] for the configured buffer limit [104857600]
    at org.elasticsearch.client.HeapBufferedAsyncResponseConsumer.onEntityEnclosed(HeapBufferedAsyncResponseConsumer.java:76)
    at org.apache.http.nio.protocol.AbstractAsyncResponseConsumer.responseReceived(AbstractAsyncResponseConsumer.java:131)
    at org.apache.http.impl.nio.client.MainClientExec.responseReceived(MainClientExec.java:315)
    at org.apache.http.impl.nio.client.DefaultClientExchangeHandlerImpl.responseReceived(DefaultClientExchangeHandlerImpl.java:147)
    at org.apache.http.nio.protocol.HttpAsyncRequestExecutor.responseReceived(HttpAsyncRequestExecutor.java:303)
    at org.apache.http.impl.nio.DefaultNHttpClientConnection.consumeInput(DefaultNHttpClientConnection.java:255)
    at org.apache.http.impl.nio.client.InternalIODispatch.onInputReady(InternalIODispatch.java:81)
    at org.apache.http.impl.nio.client.InternalIODispatch.onInputReady(InternalIODispatch.java:39)
    at org.apache.http.impl.nio.reactor.AbstractIODispatch.inputReady(AbstractIODispatch.java:114)
    at org.apache.http.impl.nio.reactor.BaseIOReactor.readable(BaseIOReactor.java:162)
    at org.apache.http.impl.nio.reactor.AbstractIOReactor.processEvent(AbstractIOReactor.java:337)
    at org.apache.http.impl.nio.reactor.AbstractIOReactor.processEvents(AbstractIOReactor.java:315)
    at org.apache.http.impl.nio.reactor.AbstractIOReactor.execute(AbstractIOReactor.java:276)
    at org.apache.http.impl.nio.reactor.BaseIOReactor.execute(BaseIOReactor.java:104)
    at org.apache.http.impl.nio.reactor.AbstractMultiworkerIOReactor$Worker.run(AbstractMultiworkerIOReactor.java:588)
    at java.lang.Thread.run(Thread.java:748)
Exception in thread "main" java.lang.NullPointerException
    at com.es.utility.DocumentSearch.main(DocumentSearch.java:88)

Please find my Java API code to fetch documents from ElasticSearch

private final static String ATTACHMENT = "document_attachment";
private final static String TYPE = "doc";

public static void main(String args[])
{
    RestHighLevelClient restHighLevelClient = null;

    try {
        restHighLevelClient = new RestHighLevelClient(RestClient.builder(new HttpHost("localhost", 9200, "http"),
                new HttpHost("localhost", 9201, "http")));

    } catch (Exception e) {
        System.out.println(e.getMessage());
    }



    SearchRequest contentSearchRequest = new SearchRequest(ATTACHMENT); 
    SearchSourceBuilder contentSearchSourceBuilder = new SearchSourceBuilder();
    contentSearchRequest.types(TYPE);
    QueryBuilder attachmentQB = QueryBuilders.matchQuery("attachment.content", "activa");
    contentSearchSourceBuilder.query(attachmentQB);
    contentSearchSourceBuilder.size(50);
    contentSearchRequest.source(contentSearchSourceBuilder);
    SearchResponse contentSearchResponse = null;
    System.out.println("Request --->"+contentSearchRequest.toString());
    try {
        contentSearchResponse = restHighLevelClient.search(contentSearchRequest);
    } catch (IOException e) {
        e.getLocalizedMessage();
    }

    try {
        System.out.println("Response --->"+restHighLevelClient.search(contentSearchRequest)); // am printing the mentioned error from this line.
    } catch (IOException e) {
        e.printStackTrace();
    }
    SearchHit[] contentSearchHits = contentSearchResponse.getHits().getHits();
    long contenttotalHits=contentSearchResponse.getHits().totalHits;
    System.out.println("condition Total Hits --->"+contenttotalHits);

Am using ElasticSearch version 6.2.3

Karthikeyan
  • 1,927
  • 6
  • 44
  • 109

4 Answers4

15

You need to increase the http.max_content_length in your elasticsearch.yml config file.

By default, it is set at 100MB (100*1024*1024 = 104857600), so you probably need to set it a little higher than that.

UPDATE

It is actually a different issue, which is explained here. Basically, the default HttpAsyncResponseConsumerFactory buffers the whole response body in the heap memory, but only up to 100mb by default. The workaround is to configure another size for that buffer, but your only option is to work with the low-level REST client instead. In ES 7, you'll be able to do this on the High-level REST client using a class called RequestOptions, but it's not released yet.

long BUFFER_SIZE = 120 * 1024 * 1024;     <---- set buffer to 120MB instead of 100MB
Map<String, String> params = Collections.emptyMap();
HttpEntity entity = new NStringEntity(contentSearchSourceBuilder.toString(), ContentType.APPLICATION_JSON);
HttpAsyncResponseConsumerFactory.HeapBufferedResponseConsumerFactory consumerFactory =
        new HttpAsyncResponseConsumerFactory.HeapBufferedResponseConsumerFactory(BUFFER_SIZE);
Response response = restClient.performRequest("GET", "/document_attachment/doc/_search", params, entity, consumerFactory); 
Val
  • 207,596
  • 13
  • 358
  • 360
  • I did that change in `elasticsearch.yml` file `http.max_content_length: 200mb` , then i restarted elasticsearch via services.msc., but still am getting the same error message. Do i need to do any thing else.? – Karthikeyan Jun 25 '18 at 10:17
  • This entry is not available in `elasticsearch.yml` file. I have added this line `http.max_content_length: 200MB` manually at the end of the file. (as last line). Eventhough its not working. – Karthikeyan Jun 25 '18 at 10:32
  • Do you have any idea., how to configure `http.max_content_length: 200mb` from `elasticsearch.yml` file is not working for ES versiion 6.2.3 – Karthikeyan Jun 25 '18 at 12:22
  • It's actually a different issue, I'm looking into it. – Val Jun 25 '18 at 12:29
  • Do we have any work around for this issue., Right now am using `RestHighLevelClient` , do we have anyother client for communication. – Karthikeyan Jun 26 '18 at 07:36
  • am able to form request using `RestClient` thanks for your help. But am not getting response or expected response. I have raised as separate question for this. https://stackoverflow.com/questions/51039930/elasticsearch-javaapi-restclient-not-giving-response – Karthikeyan Jun 26 '18 at 09:54
4

This is what I did to get it working for ES 7 using RestHighLevelClient. HttpAsyncResponseConsumerFactory.HeapBufferedResponseConsumerFactory.DEFAULT_BUFFER_LIMIT = 104857600

    RequestOptions.Builder options = RequestOptions.DEFAULT.toBuilder();
    options.setHttpAsyncResponseConsumerFactory(
            new HttpAsyncResponseConsumerFactory.HeapBufferedResponseConsumerFactory(2 * 104857600));
    response = client.search(searchRequest, options.build());
Abe
  • 310
  • 3
  • 15
1
Replace your code

from 
SearchResponse searchResponse =
            restHighLevelClient.search(searchRequestWithScroll, RequestOptions.DEFAULT);

to
   
 RequestOptions.Builder options = RequestOptions.DEFAULT.toBuilder();
            options.setHttpAsyncResponseConsumerFactory(
                    new HttpAsyncResponseConsumerFactory.HeapBufferedResponseConsumerFactory(2 * 104857600));//set buffer limit to 200mb
            SearchResponse searchResponse =
                restHighLevelClient.search(searchRequestWithScroll, options.build());
uma mahesh
  • 131
  • 1
  • 5
1

To follow up on a previous answer, this is a method for working with the newer (2022+) Elasticsearch Java client, for all responses. To use the RestClientOptions class with an individual request, look at the withTransportOptions method, which will create a new client with the adjusted options that can be used on-demand and garbage collected afterwards.

final int MAX_RESPONSE_ENTITY_SIZE = 200 * 1048576; // 200 MB
RestClientTransport transport = ....;

RequestOptions.Builder requestOptionsBuilder = RequestOptions.DEFAULT.toBuilder();
requestOptionsBuilder.setHttpAsyncResponseConsumerFactory(
                new HttpAsyncResponseConsumerFactory.HeapBufferedResponseConsumerFactory(MAX_RESPONSE_ENTITY_SIZE));
RestClientOptions myOptions = new RestClientOptions(requestOptionsBuilder.build());
ElasticsearchAsyncClient esJavaClient = new ElasticsearchAsyncClient(transport, myOptions);