My team is looking to better understand current workflows associated with troubleshooting and remediating application failures in Kubernetes environments. How do you typically go about troubleshooting K8s app failures? What are the biggest bottlenecks or frustrations in the process?
Additional context: We’re building a platform that determines causality and automates the detection and remediation of application failures. It’s currently in early access; we’re looking for input from this community to make sure it solves a real pain. If you’re interested in testing out the early access product and providing feedback, you can sign up here. (This is certainly not meant to be vendor spam – please comment if inappropriate and I can delete.)
An example that we've tried is documented here: https://www.causely.io/blog/managing-database-connection-failures/