0

I've a .NET Core application that sends bulk data to ElasticSearch 7, using ElasticSearch.NET 7 & Nest 7.

The application runs in a docker container and works perfectly for about 1 hour. Then we see an error message & the application stops working.

The error that I receive is a SniffFailure. The connection to ES is retried but mostly ends in a "MaxTimeOutReached".

It looks like - for some reason - the connection to ES goes down on a regular basis. The "quick fix" solution that we're currently using is restarting the docker container on error, but we would like to resolve this issue.

As you can see in the log below: I'm able to send docs to ES and then at some point I receive this error. Mostly after running my application for 1h.

2020-08-05 11:22:01.5553| INFO| 001 ES_indexName processed documents 08/05/2020 11:22:01 1 in 202.3803 ms
2020-08-05 11:29:28.9633| INFO| 001 ES_indexName processed documents 08/05/2020 11:29:28 1 in 179.7982 ms
2020-08-05 11:40:08.4666| INFO| 001 ES_indexName processed documents 08/05/2020 11:40:07 1 in 291.5695 ms
2020-08-05 11:47:26.2924|ERROR| failed Invalid NEST response built from a unsuccessful () low level call on POST: /ES_indexName/_bulk
# Invalid Bulk items:
# Audit trail of this API call:
 - [1] PingFailure: Node: https://ES_node1:port/ Exception: PipelineException Took: 00:01:40.0193513
 - [2] SniffOnFail: Took: 00:05:00.0132241
 - [3] SniffFailure: Node: https://ES_node2:port/ Exception: PipelineException Took: 00:01:40.0036955
 - [4] SniffFailure: Node: https://ES_node3:port/ Exception: PipelineException Took: 00:01:40.0019296
 - [5] SniffFailure: Node: https://ES_node1:port/ Exception: PipelineException Took: 00:01:40.0005639
 - [6] MaxTimeoutReached:
# OriginalException: Elasticsearch.Net.ElasticsearchClientException: Maximum timeout reached while retrying request. Call: unknown resource
 ---> Elasticsearch.Net.PipelineException: Failed sniffing cluster state.
 ---> System.AggregateException: One or more errors occurred. (An error occurred trying to write the request data to the specified node.) (An error occurred trying to write the request data to the specified node.) (An error occurred trying to write the request data to the specified node.)
 ---> Elasticsearch.Net.PipelineException: An error occurred trying to write the request data to the specified node.
 ---> System.Threading.Tasks.TaskCanceledException: The operation was canceled.
   at System.Net.Http.ConnectHelper.ConnectAsync(String host, Int32 port, CancellationToken cancellationToken)
   at System.Net.Http.HttpConnectionPool.ConnectAsync(HttpRequestMessage request, Boolean allowHttp2, CancellationToken cancellationToken)
   at System.Net.Http.HttpConnectionPool.CreateHttp11ConnectionAsync(HttpRequestMessage request, CancellationToken cancellationToken)
   at System.Net.Http.HttpConnectionPool.GetHttpConnectionAsync(HttpRequestMessage request, CancellationToken cancellationToken)
   at System.Net.Http.HttpConnectionPool.SendWithRetryAsync(HttpRequestMessage request, Boolean doRequestAuth, CancellationToken cancellationToken)
   at System.Net.Http.RedirectHandler.SendAsync(HttpRequestMessage request, CancellationToken cancellationToken)
   at System.Net.Http.DecompressionHandler.SendAsync(HttpRequestMessage request, CancellationToken cancellationToken)
   at System.Net.Http.HttpClient.FinishSendAsyncUnbuffered(Task`1 sendTask, HttpRequestMessage request, CancellationTokenSource cts, Boolean disposeCts)
   at Elasticsearch.Net.HttpConnection.RequestAsync[TResponse](RequestData requestData, CancellationToken cancellationToken)
   --- End of inner exception stack trace ---
   at Elasticsearch.Net.RequestPipeline.SniffAsync(CancellationToken cancellationToken)
   --- End of inner exception stack trace ---
 ---> (Inner Exception #1) Elasticsearch.Net.PipelineException: An error occurred trying to write the request data to the specified node.
 ---> System.Threading.Tasks.TaskCanceledException: The operation was canceled.
   at System.Net.Http.ConnectHelper.ConnectAsync(String host, Int32 port, CancellationToken cancellationToken)
   at System.Net.Http.HttpConnectionPool.ConnectAsync(HttpRequestMessage request, Boolean allowHttp2, CancellationToken cancellationToken)
   at System.Net.Http.HttpConnectionPool.CreateHttp11ConnectionAsync(HttpRequestMessage request, CancellationToken cancellationToken)
   at System.Net.Http.HttpConnectionPool.GetHttpConnectionAsync(HttpRequestMessage request, CancellationToken cancellationToken)
   at System.Net.Http.HttpConnectionPool.SendWithRetryAsync(HttpRequestMessage request, Boolean doRequestAuth, CancellationToken cancellationToken)
   at System.Net.Http.RedirectHandler.SendAsync(HttpRequestMessage request, CancellationToken cancellationToken)
   at System.Net.Http.DecompressionHandler.SendAsync(HttpRequestMessage request, CancellationToken cancellationToken)
   at System.Net.Http.HttpClient.FinishSendAsyncUnbuffered(Task`1 sendTask, HttpRequestMessage request, CancellationTokenSource cts, Boolean disposeCts)
   at Elasticsearch.Net.HttpConnection.RequestAsync[TResponse](RequestData requestData, CancellationToken cancellationToken)
   --- End of inner exception stack trace ---
   at Elasticsearch.Net.RequestPipeline.SniffAsync(CancellationToken cancellationToken)<---

 ---> (Inner Exception #2) Elasticsearch.Net.PipelineException: An error occurred trying to write the request data to the specified node.
 ---> System.Threading.Tasks.TaskCanceledException: The operation was canceled.
   at System.Net.Http.ConnectHelper.ConnectAsync(String host, Int32 port, CancellationToken cancellationToken)
   at System.Net.Http.HttpConnectionPool.ConnectAsync(HttpRequestMessage request, Boolean allowHttp2, CancellationToken cancellationToken)
   at System.Net.Http.HttpConnectionPool.CreateHttp11ConnectionAsync(HttpRequestMessage request, CancellationToken cancellationToken)
   at System.Net.Http.HttpConnectionPool.GetHttpConnectionAsync(HttpRequestMessage request, CancellationToken cancellationToken)
   at System.Net.Http.HttpConnectionPool.SendWithRetryAsync(HttpRequestMessage request, Boolean doRequestAuth, CancellationToken cancellationToken)
   at System.Net.Http.RedirectHandler.SendAsync(HttpRequestMessage request, CancellationToken cancellationToken)
   at System.Net.Http.DecompressionHandler.SendAsync(HttpRequestMessage request, CancellationToken cancellationToken)
   at System.Net.Http.HttpClient.FinishSendAsyncUnbuffered(Task`1 sendTask, HttpRequestMessage request, CancellationTokenSource cts, Boolean disposeCts)
   at Elasticsearch.Net.HttpConnection.RequestAsync[TResponse](RequestData requestData, CancellationToken cancellationToken)
   --- End of inner exception stack trace ---
   at Elasticsearch.Net.RequestPipeline.SniffAsync(CancellationToken cancellationToken)<---

   --- End of inner exception stack trace ---
   at Elasticsearch.Net.RequestPipeline.SniffAsync(CancellationToken cancellationToken)
   at Elasticsearch.Net.RequestPipeline.SniffOnConnectionFailureAsync(CancellationToken cancellationToken)
   at Elasticsearch.Net.Transport`1.PingAsync(IRequestPipeline pipeline, Node node, CancellationToken cancellationToken)
   at Elasticsearch.Net.Transport`1.RequestAsync[TResponse](HttpMethod method, String path, CancellationToken cancellationToken, PostData data, IRequestParameters requestParameters)
   --- End of inner exception stack trace ---
# Audit exception in step 1 PingFailure:
Elasticsearch.Net.PipelineException: An error occurred trying to write the request data to the specified node.
 ---> System.Threading.Tasks.TaskCanceledException: The operation was canceled.
   at System.Net.Http.ConnectHelper.ConnectAsync(String host, Int32 port, CancellationToken cancellationToken)
   at System.Net.Http.HttpConnectionPool.ConnectAsync(HttpRequestMessage request, Boolean allowHttp2, CancellationToken cancellationToken)
   at System.Net.Http.HttpConnectionPool.CreateHttp11ConnectionAsync(HttpRequestMessage request, CancellationToken cancellationToken)
   at System.Net.Http.HttpConnectionPool.GetHttpConnectionAsync(HttpRequestMessage request, CancellationToken cancellationToken)
   at System.Net.Http.HttpConnectionPool.SendWithRetryAsync(HttpRequestMessage request, Boolean doRequestAuth, CancellationToken cancellationToken)
   at System.Net.Http.RedirectHandler.SendAsync(HttpRequestMessage request, CancellationToken cancellationToken)
   at System.Net.Http.DecompressionHandler.SendAsync(HttpRequestMessage request, CancellationToken cancellationToken)
   at System.Net.Http.HttpClient.FinishSendAsyncUnbuffered(Task`1 sendTask, HttpRequestMessage request, CancellationTokenSource cts, Boolean disposeCts)
   at Elasticsearch.Net.HttpConnection.RequestAsync[TResponse](RequestData requestData, CancellationToken cancellationToken)
   --- End of inner exception stack trace ---
   at Elasticsearch.Net.RequestPipeline.PingAsync(Node node, CancellationToken cancellationToken)
# Audit exception in step 3 SniffFailure:
Elasticsearch.Net.PipelineException: An error occurred trying to write the request data to the specified node.
 ---> System.Threading.Tasks.TaskCanceledException: The operation was canceled.
   at System.Net.Http.ConnectHelper.ConnectAsync(String host, Int32 port, CancellationToken cancellationToken)
   at System.Net.Http.HttpConnectionPool.ConnectAsync(HttpRequestMessage request, Boolean allowHttp2, CancellationToken cancellationToken)
   at System.Net.Http.HttpConnectionPool.CreateHttp11ConnectionAsync(HttpRequestMessage request, CancellationToken cancellationToken)
   at System.Net.Http.HttpConnectionPool.GetHttpConnectionAsync(HttpRequestMessage request, CancellationToken cancellationToken)
   at System.Net.Http.HttpConnectionPool.SendWithRetryAsync(HttpRequestMessage request, Boolean doRequestAuth, CancellationToken cancellationToken)
   at System.Net.Http.RedirectHandler.SendAsync(HttpRequestMessage request, CancellationToken cancellationToken)
   at System.Net.Http.DecompressionHandler.SendAsync(HttpRequestMessage request, CancellationToken cancellationToken)
   at System.Net.Http.HttpClient.FinishSendAsyncUnbuffered(Task`1 sendTask, HttpRequestMessage request, CancellationTokenSource cts, Boolean disposeCts)
   at Elasticsearch.Net.HttpConnection.RequestAsync[TResponse](RequestData requestData, CancellationToken cancellationToken)
   --- End of inner exception stack trace ---
   at Elasticsearch.Net.RequestPipeline.SniffAsync(CancellationToken cancellationToken)
# Audit exception in step 4 SniffFailure:
Elasticsearch.Net.PipelineException: An error occurred trying to write the request data to the specified node.
 ---> System.Threading.Tasks.TaskCanceledException: The operation was canceled.
   at System.Net.Http.ConnectHelper.ConnectAsync(String host, Int32 port, CancellationToken cancellationToken)
   at System.Net.Http.HttpConnectionPool.ConnectAsync(HttpRequestMessage request, Boolean allowHttp2, CancellationToken cancellationToken)
   at System.Net.Http.HttpConnectionPool.CreateHttp11ConnectionAsync(HttpRequestMessage request, CancellationToken cancellationToken)
   at System.Net.Http.HttpConnectionPool.GetHttpConnectionAsync(HttpRequestMessage request, CancellationToken cancellationToken)
   at System.Net.Http.HttpConnectionPool.SendWithRetryAsync(HttpRequestMessage request, Boolean doRequestAuth, CancellationToken cancellationToken)
   at System.Net.Http.RedirectHandler.SendAsync(HttpRequestMessage request, CancellationToken cancellationToken)
   at System.Net.Http.DecompressionHandler.SendAsync(HttpRequestMessage request, CancellationToken cancellationToken)
   at System.Net.Http.HttpClient.FinishSendAsyncUnbuffered(Task`1 sendTask, HttpRequestMessage request, CancellationTokenSource cts, Boolean disposeCts)
   at Elasticsearch.Net.HttpConnection.RequestAsync[TResponse](RequestData requestData, CancellationToken cancellationToken)
   --- End of inner exception stack trace ---
   at Elasticsearch.Net.RequestPipeline.SniffAsync(CancellationToken cancellationToken)
# Audit exception in step 5 SniffFailure:
Elasticsearch.Net.PipelineException: An error occurred trying to write the request data to the specified node.
 ---> System.Threading.Tasks.TaskCanceledException: The operation was canceled.
   at System.Net.Http.ConnectHelper.ConnectAsync(String host, Int32 port, CancellationToken cancellationToken)
   at System.Net.Http.HttpConnectionPool.ConnectAsync(HttpRequestMessage request, Boolean allowHttp2, CancellationToken cancellationToken)
   at System.Net.Http.HttpConnectionPool.CreateHttp11ConnectionAsync(HttpRequestMessage request, CancellationToken cancellationToken)
   at System.Net.Http.HttpConnectionPool.GetHttpConnectionAsync(HttpRequestMessage request, CancellationToken cancellationToken)
   at System.Net.Http.HttpConnectionPool.SendWithRetryAsync(HttpRequestMessage request, Boolean doRequestAuth, CancellationToken cancellationToken)
   at System.Net.Http.RedirectHandler.SendAsync(HttpRequestMessage request, CancellationToken cancellationToken)
   at System.Net.Http.DecompressionHandler.SendAsync(HttpRequestMessage request, CancellationToken cancellationToken)
   at System.Net.Http.HttpClient.FinishSendAsyncUnbuffered(Task`1 sendTask, HttpRequestMessage request, CancellationTokenSource cts, Boolean disposeCts)
   at Elasticsearch.Net.HttpConnection.RequestAsync[TResponse](RequestData requestData, CancellationToken cancellationToken)
   --- End of inner exception stack trace ---
   at Elasticsearch.Net.RequestPipeline.SniffAsync(CancellationToken cancellationToken)
# Request:
<Request stream not captured or already read to completion by serializer. Set DisableDirectStreaming() on ConnectionSettings to force it to be set on the response.>
# Response:
<Response stream not captured or already read to completion by serializer. Set DisableDirectStreaming() on ConnectionSettings to force it to be set on the response.>
 400039.9975 ms MUST RETRY

Anyone had a similar issue or any suggestions on to resolve (or debug) this? It's currently unclear why we lose connection to ES.

Thanks

  • Are you keeping a connection to your ES server open for the entire duration? – DavidG Oct 13 '20 at 11:40
  • The low level TCP connection may be closing. Do you send continuous data or are there long idle periods. The server may be closing connection that is idle. Is the time always that same or does it vary? You may be getting errors on the channel and the connection is closing due to errors. You can using from cmd.exe >PING -t -l65000 IP which will send long messages to determine if the line is good. – jdweng Oct 13 '20 at 13:32
  • @jdweng Yes, I'm keeping the connection open as I'm sending continious data.The time is always around 1h (maybe a 5-10 min later). – Alice_Heartilly Oct 14 '20 at 12:40
  • You could use a sniffer to debug. HTTP uses one or more TCP packets as transport layer. So if a connection terminates you would see a [FIN] in the TCP packets. If errors were occurring you would see retires of the TCP packets (sequence number would be the same of the repeats). You would also see missing ACKs. You can also see the status changing using from cmd.exe >Netstat -a – jdweng Oct 14 '20 at 13:21

0 Answers0