1

We are using the elasticsearch REST Java client (we are on Java 7 so can't use the normal elasticsearch Java client) to interact with our elasticsearch servers. This all works fine except for when we are trying to do an intial indexing of about 1.3m documents. This runs for a while but after a few hundred thousand documents we are getting a

20/06 21:27:33,153 ERROR [cid=51][stderr][write:71] (pool-837116-thread-1) Exception in thread "pool-837116-thread-1" java.lang.OutOfMemoryError: unable to create new native thread
20/06 21:27:33,154 ERROR [cid=51][stderr][write:71] (pool-837116-thread-1)  at java.lang.Thread.start0(Native Method)
20/06 21:27:33,154 ERROR [cid=51][stderr][write:71] (pool-837116-thread-1)  at java.lang.Thread.start(Thread.java:693)
20/06 21:27:33,154 ERROR [cid=51][stderr][write:71] (pool-837116-thread-1)  at org.apache.http.impl.nio.reactor.AbstractMultiworkerIOReactor.execute(AbstractMultiworkerIOReactor.java:334)
20/06 21:27:33,154 ERROR [cid=51][stderr][write:71] (pool-837116-thread-1)  at org.apache.http.impl.nio.conn.PoolingNHttpClientConnectionManager.execute(PoolingNHttpClientConnectionManager.java:194)
20/06 21:27:33,154 ERROR [cid=51][stderr][write:71] (pool-837116-thread-1)  at org.apache.http.impl.nio.client.CloseableHttpAsyncClientBase$1.run(CloseableHttpAsyncClientBase.java:64)
20/06 21:27:33,155 ERROR [cid=51][stderr][write:71] (pool-837116-thread-1)  at java.lang.Thread.run(Thread.java:724)

followed by

java.lang.IllegalStateException: Request cannot be executed; I/O reactor status: STOPPED
    at org.apache.http.util.Asserts.check(Asserts.java:46)
    at org.apache.http.impl.nio.client.CloseableHttpAsyncClientBase.ensureRunning(CloseableHttpAsyncClientBase.java:90)
    at org.apache.http.impl.nio.client.InternalHttpAsyncClient.execute(InternalHttpAsyncClient.java:123)
    at org.elasticsearch.client.RestClient.performRequestAsync(RestClient.java:343)
    at org.elasticsearch.client.RestClient.performRequestAsync(RestClient.java:325)
    at org.elasticsearch.client.RestClient.performRequest(RestClient.java:218)
    at org.elasticsearch.client.RestClient.performRequest(RestClient.java:191)

As you can see the Elasticsearch REST client is using apache http nio. What I found odd is that the nio library is creating a thread for every single request (or connection?). From the log above you can see the thread (pool-837116-thread-1). There are also lots of I/O dispatcher threads with increasing numbers.

The total number of live threads doesn't seem to change much though. So it seems rather than reusing threads a (or two actually) new thread is created for each connect cycle. The upload is basically:

1. Create client

    restClient = RestClient.builder(new HttpHost(host.getHost(),host.getPort(),host.getProtocol())/*,new HttpHost(host.getHost(),host.getPort()+1,host.getProtocol())*/)
                            .setHttpClientConfigCallback(new RestClientBuilder.HttpClientConfigCallback() {
                                @Override
                                public HttpAsyncClientBuilder customizeHttpClient(HttpAsyncClientBuilder httpClientBuilder) {
                                    return httpClientBuilder
                                            .setDefaultCredentialsProvider(credsProvider)
                                                                                }
                            }).setMaxRetryTimeoutMillis(30000).build();

2. Send request with json body and close client

        try{
            HttpEntity entity = new NStringEntity(json,ContentType.APPLICATION_JSON);
            Response indexResponse = restClient.performRequest("PUT", endpoint, parameters,entity,header);
            log.debug("Response #0 #1", indexResponse,indexResponse.getStatusLine());
            log.debug("Entity #0",indexResponse.getEntity());

        }finally{
            if(restClient!=null){
                log.debug("Closing restClient #0", restClient);
                restClient.close();
            }
        }

Is this normal? Why isn't apache nio reusing threads? Is this a problem with the elasticsearch REST client, apache nio or my code? I call close on the restClient, not sure what else I am supposed to do.

I've tried to set the thread count to just 1 on the IO Reactor:

restClient = RestClient.builder(new HttpHost(host.getHost(),host.getPort(),host.getProtocol())/*,new HttpHost(host.getHost(),host.getPort()+1,host.getProtocol())*/)
                            .setHttpClientConfigCallback(new RestClientBuilder.HttpClientConfigCallback() {
                                @Override
                                public HttpAsyncClientBuilder customizeHttpClient(HttpAsyncClientBuilder httpClientBuilder) {
                                    return httpClientBuilder
                                            .setDefaultCredentialsProvider(credsProvider)
                                            .setDefaultIOReactorConfig(IOReactorConfig.custom().setIoThreadCount(1).build()); //set to one thread
                                }
                            }).setMaxRetryTimeoutMillis(30000).build();

but that didn't change anything regarding the reuse of threads.

Bartek Andrzejczak
  • 1,292
  • 2
  • 14
  • 27
Ben
  • 1,922
  • 3
  • 23
  • 37
  • Are you using bulk insert or one-by-one insert? – jvwilge Jun 21 '17 at 13:22
  • I've found the reason for the OutOfMemoryError. Although I was using a try - finally block in which I would close the client - an exception was thrown outside of that block (the block didn't cover everything D'oh). But it still looks wrong that so many threads are being created (although the number of overall threads does not significantly increase). – Ben Jun 21 '17 at 13:26
  • This is a one-by-one insert as I need to be sure that the data for each one was uploaded. It is using the same mechanism as for normal indexing. – Ben Jun 21 '17 at 13:27
  • The bulk insert also notifies you when a single record in the batch failed, so that might be an option. The bulk api is a lot faster and less resource intensive (even with very small batches) – jvwilge Jun 21 '17 at 18:03
  • Some of the JSON data has line feeds in it (which I cannot remove). Does that still work with the bulk insert? I've read that you should not pretty print the json as it is using line feeds to separate commands. Is it clever enough to skip line feeds in quotes? – Ben Jun 22 '17 at 07:52
  • Since the JSON is parsed I believe it should work. It's easy to try out with a local elasticsearch installation. – jvwilge Jun 22 '17 at 08:55
  • Have you resolved this? Is there a solution for this ? All eyes on this. Please share the resolution. – Ramesh Jan 02 '18 at 05:34

2 Answers2

1

I've found the reason for the OutOfMemoryError. Although I was using a try - finally block in which I would close the client - an exception was thrown outside of that block (the block didn't cover everything D'oh). But it still looks wrong that so many threads are being created (although the number of overall threads does not significantly increase).

Ben
  • 1,922
  • 3
  • 23
  • 37
0

For the record: I also ran into this error IllegalStateException: Request cannot be executed; I/O reactor status: STOPPED using ElasticSearch.

The root cause was on the side of my application, which tried to login with the same account multiple times, which in turn lead to multiple unnecessary save() and retrieveById() statements on the respective indices.

To diagnose a similar problem, it helps to closely examine the stacktrace and - if the error occurs in several different unit tests - look for any commonalities there, such as the same method being called on the index.

Dharman
  • 30,962
  • 25
  • 85
  • 135
martin_wun
  • 1,599
  • 1
  • 15
  • 33