We completely host a constellation of services in AWS (no external dependencies as far as these services go). We periodically receive healthcheck failures (502 as public services try to contact the internal service ALBs), as frequent as every hour or two. The services experience no disruption whatsoever.
I've tried all manner of healthcheck settings (long and short durations, high and low counts [until considered successful or failed]). When I've looked at the HTTP log in the past, I believe that there were no records whatsoever for the failed requests; I'd just assumed that the service was downed before the request finished and one could be written. We have regular activity, but it wouldn't be considered high-volume. Congestion is not a factor (which might disrupt normal requests, but isn't per the above).
We have more than one load-balanced instance per service.
This is a long-running issue and I've periodically searched and tried whatever reasonable approaches have been suggested, but I've had no luck learning anything further.
The platform is largely uWSGI (Python) behind Nginx.
How could I further debug this?