ElasticSearch and Apache HttpAsyncClient

Question

I'm trying to use ElasticSearch REST API with Java Apache HttpAsyncClient library. I want to use persistent pipelining connection. Here is some test code (output is in comments):

@Test
public void testEsPipeliningClient() throws IOException, ExecutionException, InterruptedException
{
    testPost(HttpAsyncClients.createDefault());
    //201: {"_index":"test_index","_type":"test_type","_id":"AVIHYGnqdqqg_TAHm4ix","_version":1,"_shards":{"total":2,"successful":1,"failed":0},"created":true}
    testPost(HttpAsyncClients.createPipelining());
    //400: No handler found for uri [http://127.0.0.1:9200/test_index/test_type] and method [POST]
}

private void testPost(CloseableHttpAsyncClient client) throws ExecutionException, InterruptedException, IOException
{
    client.start();
    HttpPost request = new HttpPost("http://127.0.0.1:9200/test_index/test_type");
    request.setEntity(new StringEntity("{\"some_field\": \"some_value\"}"));
    Future<HttpResponse> responseFuture = client.execute(request, null);
    HttpResponse response = responseFuture.get();
    System.err.println(response.getStatusLine().getStatusCode() + ": " + EntityUtils.toString(response.getEntity()));
}

I can't understand, why it works fine with HttpAsyncClients.createDefault() client, but doesn't work with HttpAsyncClients.createPipelining(). Also I can't understand the difference between these two creation methods.

Why do I get error response when I use createPipelining()?

I tried to see the difference with https://httpbin.org/post but it showed me the same result with both options. I use default ElasticSearch settings.

Thanks!

UPD1

I tried with PUT document (PUT http://127.0.0.1/test_index/test_type/<doc id>) request with the same result - it works fine with createDefault() but I got similar error when do it with createPipelining() - No handler was found <...>.

But when I try to execute request to create index (PUT http://127.0.0.1/<index name>) there is another error. See the code below:

@Test
public void testEsPipeliningClient() throws IOException, ExecutionException, InterruptedException
{
    testCreateIndex(HttpAsyncClients.createDefault());
    //200: {"acknowledged":true}
    testCreateIndex(HttpAsyncClients.createPipelining());
    //400: {"error":{"root_cause":[{"type":"mapper_parsing_exception","reason":"failed to parse, document is empty"}],"type":"mapper_parsing_exception","reason":"failed to parse, document is empty"},"status":400}
}

private void testCreateIndex(CloseableHttpAsyncClient client) throws ExecutionException, InterruptedException, IOException
{
    client.start();
    HttpPut request = new HttpPut("http://127.0.0.1:9200/" + RandomStringUtils.randomAlphabetic(8).toLowerCase());
    Future<HttpResponse> responseFuture = client.execute(request, null);
    HttpResponse response = responseFuture.get();
    System.err.println(response.getStatusLine().getStatusCode() + ": " + EntityUtils.toString(response.getEntity()));
}

As I can see at this documentation page ElasticSearch supports HTTP pipelining by default. Maybe there anything I need to change in ES settings?

UPD2

Here are some wire logs for code in UPD1 section with different logging settings:

Dorg.apache.commons.logging.simplelog.log.org.apache.http=DEBUG -Dorg.apache.commons.logging.simplelog.log.org.apache.http.wire=INFO

http://pastebin.com/v29uvgbj

-Dorg.apache.commons.logging.simplelog.log.org.apache.http.impl.conn=DEBUG -Dorg.apache.commons.logging.simplelog.log.org.apache.http.impl.client=DEBUG -Dorg.apache.commons.logging.simplelog.log.org.apache.http.client=DEBUG -Dorg.apache.commons.logging.simplelog.log.org.apache.http.wire=DEBUG

http://pastebin.com/G9ij15d6

UPD3

I just tried to replace createDefault() with createMinimal() and it caused the same error that createPipelining(). Any ideas what in MinimalHttpAsyncClient may cause this problem? Maybe there is a way I can manually create pipelining client (with builder classes) without this problem?

@oleg is there any way I can make ES log all the requests? Or I need to manually sniff my traffic? Also, I've updated the question with some new info, maybe it would be useful — coolguy, Jan 05 '16 at 13:14
Please post client side logs http://hc.apache.org/httpcomponents-client-4.5.x/logging.html — ok2c, Jan 05 '16 at 13:23
@oleg I've added some logs for index create request. If you need logs with any other logging settings I can do it, just tell what settings should I use — coolguy, Jan 05 '16 at 13:44
Hmm... I just tried to replace `createDefault()` with `createMinimal()` and it caused the same error that `createPipelining()`. Any ideas what in `MinimalHttpAsyncClient` may cause this problem? Maybe there is a way I can manually create pipelining client (with builder classes) without this problem? — coolguy, Jan 05 '16 at 13:53

score 2 · Accepted Answer · answered Jan 05 '16 at 13:53

2

The server must be choking on absolute request URI in the request line

[DEBUG] wire - http-outgoing-1 >> "PUT http://127.0.0.1:9200/ydiwdsid HTTP/1.1[\r][\n]"

HttpAsyncClient in the pipelining mode employs a minimal protocol processing chain. It does not attempt to rewrite the request URI of the request object.

For your particular case request pipelining does not seem to make a lot of sense. Not to mention that unless you are submitting requests in batches you are not even using pipelined execution.

answered Jan 05 '16 at 13:53

ok2c

26,450
5
63
71

How can I check if it it's really about absolute request URI? About pipelined execution: with `createPipelining()` client there is only one connection used every time I perform request (to the same host) through this client, right? And about batches - could you please explain how could I try it? My real purpose is to process new requests again and again without waiting for the responses (well I want to receive them sometimes, but again - I don't want response awaiting to block the new requests). What client should I use for that? – coolguy Jan 05 '16 at 14:00
I realized that when I execute my code with `MinimalHttpAsyncClient` it somehow creates index called `http:` in ElasticSearch. The same thing when I try to perform the following command: `nc 127.0.0.1 9200 < absolute.http` where absolute.http is: http://pastebin.com/zgfSHcNG . But when I try `nc 127.0.0.1 9200 < relative.http` where relative.http is: http://pastebin.com/b0yFCAB3 it works fine as expected. I couldn't manage how to receive the responses with `nc` or how to do the same thing using `curl`, but I think response in case of absolute.http would be `400` (like in the question). – coolguy Jan 05 '16 at 14:45
I managed how to solve this problem (see Val's answer). But I'm still wondering how to code everything right if I want to use HTTP pipelining, and avoid response awaiting slowing factor? – coolguy Jan 05 '16 at 15:11
1

One needs to submit requests in batches to take advantage of request pipelining. See http://hc.apache.org/httpcomponents-asyncclient-4.1.x/httpasyncclient/examples/org/apache/http/examples/nio/client/AsyncClientPipelined.java – ok2c Jan 05 '16 at 19:35
Thanks, I think I'll do it. But I can't fully understand, why I don't take advantage of request pipelining without using request batches? If I execute one request, receive `Future` for it and then execute another request, it's not necessary to wait until `Future` becomes `isDone()`, right? Also, am I right that there is only one connection opened in such case? – coolguy Jan 06 '16 at 05:47
The way both blocking and non-blocking HCs deal with connection in the connection pool is such: they lease a connection from the pool, execute whatever message exchanges are necessary (authentication, redirects, etc) and release the connection back to the pool. In the pipelining model unless one executes multiple requests at once HC releases the connection upon completion of the first exchange. Naturally nothing gets pipelined – ok2c Jan 06 '16 at 15:33
Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/99956/discussion-between-coolguy-and-oleg). – coolguy Jan 06 '16 at 15:48

score 2 · Answer 2 · answered Jan 05 '16 at 14:57

Actually, you simply need to extract the host from the URL and create an HttpPost object only with the absolute path. See the changes on the second, third and fifth lines below:

client.start();
HttpHost targetHost = new HttpHost("127.0.0.1", 9200);
HttpPost request = new HttpPost("/test_index/test_type");
request.setEntity(new StringEntity("{\"some_field\": \"some_value\"}"));
Future<HttpResponse> responseFuture = client.execute(targetHost, request, null);
HttpResponse response = responseFuture.get();
System.out.println(response.getStatusLine().getStatusCode() + ": " + EntityUtils.toString(response.getEntity()));

Doing these three changes and running the code again will yield this:

201: {"_index":"test_index","_type":"test_type","_id":"AVISSimIZHOoPG8ibOyF","_version":1,"created":true}
201: {"_index":"test_index","_type":"test_type","_id":"AVISSimjZHOoPG8ibOyG","_version":1,"created":true}

seems like it's not possible to set second bounty at Stack Overflow :( But anyway thank you very much! — coolguy, Jan 06 '16 at 14:12

ElasticSearch and Apache HttpAsyncClient

2 Answers2

Linked