1

Looking for way to debug why backend for NIFI is failing. I created a NIFI cluster (verison 1.9.0, HDF 3.1.1.4, AMBARI 2.7.3) on Google cloud. Created HTTPS load balancer terminating https front end, and back end is the instance group for SSL enabled NIFI cluster. Getting a 502 back end error in the browser when I hit the url for the load balancer. Is there a way for Google Cloud to log the error ? There must be an error returned somewhere to troubleshoot the root cause. I don't see messages in the nifi log or the vm instance /var/log/messages. Stackdriver hasn't shown me errors. I created the keystore and truststore and followed the NIFI SSL enable instructions. It might be related to the SSL configs, or possibly firewall rules are not correct. But I am looking for some more helpful information to find the error.

gwrose
  • 11
  • 5
  • Can you share your loadbalancer configuration? And did you try to reach directly the cluster without the Loadbalancer? Is it working? – guillaume blaquiere Nov 13 '19 at 17:04
  • I tried a curl against the nifi app on nifi2, while sitting on nifi1. I found an Unknown CA error in tcpdump , in the tls handshake between nifi1 and nifi2. And then the loadbalancer health check is running, so that attempted tls connection resulted in another error, Bad Certificate. The backend of the load balancer is https, on the instance group port where ssl-enabled nifi is listening. The health check is failing, so I believe no traffic is being sent. – gwrose Nov 15 '19 at 19:23

2 Answers2

3

If I am understanding the question properly, you are looking for a way to get HTTPS load balancer logs due to back end errors and your intention is to find out the root cause.Load balancer basically return 502 error due to unhealthy backend services or for unhealthy backend VM 's.If your stackdriver logging is enabled, you might get this log using advanced filter or can search by selecting the load balancer name and look for/search 502:

Advanced filter for 502 responses due to failures to connect to backends:

resource.type="http_load_balancer"
resource.labels.url_map_name="[URL Map]"
httpRequest.status=502
jsonPayload.statusDetails="failed_to_connect_to_backend"

Advanced filter for 502 responses due to backend timeouts:

resource.type="http_load_balancer"
resource.labels.url_map_name="[URL Map]"
httpRequest.status=502
jsonPayload.statusDetails="backend_timeout"

Advanced filter for 502 responses due to prematurely closed connections:

resource.type="http_load_balancer"
resource.labels.url_map_name="[URL Map]"
httpRequest.status=502
jsonPayload.statusDetails="backend_connection_closed_before_data_sent_to_client"

The URL Map is same as the name of the load balancer for HTTP(S) for cloud console.If we create the various components of the load balancer manually, need to use the URL Map for advanced filter.

Most common root causes for "failed_to_connect_to_backend" are: 1. Firewall blocking traffic, 2. Web server software not running on backend instance, 3. Web server software misconfigured on backend instance, 4. Server resources exhausted and not accepting connections (CPU usage too high to respond, Memory usage too high, process killed ,the maximum amount of workers spawned and all are busy, Maximum established TCP connections), 5. Poorly written server implementation struggling under load or non-standard behavior.

Most common root causes for “backend_timeout” are 1. the backend instance took longer than the Backend Service timeout to respond, meaning either the application is overloaded or the Backend Service Timeout is set too low, 2. The backend instance didn't respond at all (Crashing during a request).

Most Common Root causes for” backend_connection_closed_before_data_sent_to_client” is usually caused because the keepalive configuration parameter for the web server software running on the backend instance is less than the fixed (10 minute) keepalive (HTTP idle) timeout of the GFE. There are some situations where the backend may close a connection too soon while the GFE is still sending the HTTP request.

Shanewaz
  • 191
  • 3
  • I am still not seeing logs, any logs, in log viewer, when I select for Cloud HTTP Loadbalancer. I have logging enabled. We were able to start tcpdump and wireshark revealed a Bad Certificate error in the TLSv1.2 handshake between the vm instance where nifi is running and the ip range for the load balancer health checker, 35.191.x.x. I think I should open a new question for this specific error. – gwrose Nov 15 '19 at 19:18
0

The previous response was spot on. The nifi ssl configuration is misconfigured, causing the backend health check to fail with a bad certificate. I will open a new question to address the nifi ssl configuration.

gwrose
  • 11
  • 5