Slack Bot deployed in Cloud Foundry returns 502 Bad Gateway errors

Question

In Slack, I have set up an app with a slash command. The app works well when I use a local ngrok server.

However, when I deploy the app server to PCF, it is returning 502 errors:

[CELL/0] [OUT] Downloading droplet...
[CELL/SSHD/0] [OUT] Exit status 0
[APP/PROC/WEB/0] [OUT] Exit status 143
[CELL/0] [OUT] Cell e6cf018d-0bdd-41ca-8b70-bdc57f3080f1 destroying container for instance 28d594ba-c681-40dd-4514-99b6
[PROXY/0] [OUT] Exit status 137
[CELL/0] [OUT] Downloaded droplet (81.1M)
[CELL/0] [OUT] Cell e6cf018d-0bdd-41ca-8b70-bdc57f3080f1 successfully destroyed container for instance 28d594ba-c681-40dd-4514-99b6
[APP/PROC/WEB/0] [OUT] ⚡️ Bolt app is running! (development server)
[OUT] [APP ROUTE] - [2021-12-23T20:35:11.460507625Z] "POST /slack/events HTTP/1.1" 502 464 67 "-" "Slackbot 1.0 (+https://api.slack.com/robots)" "10.0.1.28:56002" "10.0.6.79:61006" x_forwarded_for:"3.91.15.163, 10.0.1.28" x_forwarded_proto:"https" vcap_request_id:"7fe6cea6-180a-4405-5e5e-6ba9d7b58a8f" response_time:0.003282 gorouter_time:0.000111 app_id:"f1ea0480-9c6c-42ac-a4b8-a5a4e8efe5f3" app_index:"0" instance_id:"f46918db-0b45-417c-7aac-bbf2" x_cf_routererror:"endpoint_failure (use of closed network connection)" x_b3_traceid:"31bf5c74ec6f92a20f0ecfca00e59007" x_b3_spanid:"31bf5c74ec6f92a20f0ecfca00e59007" x_b3_parentspanid:"-" b3:"31bf5c74ec6f92a20f0ecfca00e59007-31bf5c74ec6f92a20f0ecfca00e59007"

Besides endpoint_failure (use of closed network connection), I also see:

x_cf_routererror:"endpoint_failure (EOF (via idempotent request))"
x_cf_routererror:"endpoint_failure (EOF)"

In PCF, I created an https:// route for the app. This is the URL I put into my Slack App's "Redirect URLs" section as well as my Slash command URL.

In Slack, the URLs end with /slack/events

This configuration all works well locally, so I guess I missed a configuration point in PCF.

Manifest.yml:

applications:
- name: kafbot
  buildpacks:
    - https://github.com/starkandwayne/librdkafka-buildpack/releases/download/v1.8.2/librdkafka_buildpack-cached-cflinuxfs3-v1.8.2.zip
    - https://github.com/cloudfoundry/python-buildpack/releases/download/v1.7.48/python-buildpack-cflinuxfs3-v1.7.48.zip
  instances: 1
  disk_quota: 2G
#  health-check-type: process
  memory: 4G
  routes:
    - route: "kafbot.apps.prod.fake_org.cloud"
  env:
    KAFKA_BROKER: 10.32.17.182:9092,10.32.17.183:9092,10.32.17.184:9092,10.32.17.185:9092
    SLACK_BOT_TOKEN: ((slack_bot_token))
    SLACK_SIGNING_SECRET: ((slack_signing_key))
  command: python app.py

What's up with your health check? Do you have that set to process? Was something failing with the default health check? Also, What port is your Python application listening on? The platform is going to pass in a `$PORT` env variable with a value in it (it is always 8080, but could change in the future). You need to make sure your app is listening on that port. Also, listen on `0.0.0.0` not `localhost` or `127.0.0.1`. — Daniel Mikusa, Dec 27 '21 at 18:59
Changing the Python app to port 8080 solved the issue. Thanks! Please feel free to add an answer to this question and I will mark it as correct. — Simon Tower, Jan 02 '22 at 02:44

score 4 · Accepted Answer · answered Jan 03 '22 at 20:21

When x_cf_routererror says endpoint_failure it means that the application has not handled the request sent to it by Gorouter for some reason.

From there, you want to look at response_time. If the response time is high (typically the same value as the timeout, like 60s almost exactly), it means your application is not responding quickly enough. If the value is low, it could mean that there is a connection problem, like Gorouter tries to make a TCP connection and cannot.

Normally this shouldn't happen. The system has a health check in place that makes sure the application is up and listening for requests. If it's not, the application will not start correctly.

In this particular case, the manifest has health-check-type: process which is disabling the standard port-based health check and using a process-based health check. This allows the application to start up even if it's not on the right port. Thus when Gorouter sends a request to the application on the expected port, it cannot connect to the application's port. Side note: typically, you'd only use process-based health checks if your application is not listening for incoming requests.

The platform is going to pass in a $PORT env variable with a value in it (it is always 8080, but could change in the future). You need to make sure your app is listening on that port. Also, you want to listen on 0.0.0.0, not localhost or 127.0.0.1.

This should ensure that Gorouter can deliver requests to your application on the agreed-upon port.

Not sure if 60sec is the typical, from this quote "Cloud foundry expects a 200 response back within 1 second, otherwise it determines the app to be unhealthy and restarts it", source: https://danielsinnott.com/blog/27 — Eduardo, May 06 '22 at 16:35
The timeout for health checks is 1s by default, you're right, but you can have different duration requests and timeouts for non-health check-related paths. — Daniel Mikusa, May 06 '22 at 20:18
I had this error pop up again in another PCF deployment. Restarting and restaging the instances had no effect - it was like we were stuck with zombie VMs. destroying the app and re-adding to PCF solved the issue. — Simon Tower, Jan 28 '23 at 00:05

Slack Bot deployed in Cloud Foundry returns 502 Bad Gateway errors

1 Answers1