0

I am trying to deploy Elastic Cloud on kubernetes. When I first tried, I have managed to start elasticsearch on kubernetes cluster. But now for some reason, when I try to deploy same quickstart elasticsearch, it never even deploys a pod for it. No services made for it. I think because elastic-operator is shutting down at start up.

I have tried to delete all-in-one.yml and redeploy it. I have tried to reset the cluster and re-deploy it. None of them worked. I am successfully deploying other non Elastic pods. The logs for elastic-operator is in below (prettifyed):

        {
   "log.level":"info",
   "@timestamp":"2021-01-19T13:04:41.881Z",
   "log.logger":"manager",
   "message":"maxprocs: Updating GOMAXPROCS=1: determined from CPU quota",
   "service.version":"1.3.1+a0a0a212",
   "service.type":"eck",
   "ecs.version":"1.4.0"
}{
   "log.level":"info",
   "@timestamp":"2021-01-19T13:04:41.882Z",
   "log.logger":"manager",
   "message":"Setting default container registry",
   "service.version":"1.3.1+a0a0a212",
   "service.type":"eck",
   "ecs.version":"1.4.0",
   "container_registry":"docker.elastic.co"
}{
   "log.level":"info",
   "@timestamp":"2021-01-19T13:04:41.882Z",
   "log.logger":"manager",
   "message":"Setting up scheme",
   "service.version":"1.3.1+a0a0a212",
   "service.type":"eck",
   "ecs.version":"1.4.0"
}{
   "log.level":"info",
   "@timestamp":"2021-01-19T13:04:41.887Z",
   "log.logger":"manager",
   "message":"Operator configured to manage all namespaces",
   "service.version":"1.3.1+a0a0a212",
   "service.type":"eck",
   "ecs.version":"1.4.0"
}{
   "log.level":"error",
   "@timestamp":"2021-01-19T13:05:11.888Z",
   "log.logger":"controller-runtime.manager",
   "message":"Failed to get API Group-Resources",
   "service.version":"1.3.1+a0a0a212",
   "service.type":"eck",
   "ecs.version":"1.4.0",
   "error":"Get \"https://10.96.0.1:443/api?timeout=1m0s\": dial tcp 10.96.0.1:443: i/o timeout",
   "error.stack_trace":"sigs.k8s.io/controller-runtime/pkg/manager.New\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.6.3/pkg/manager/manager.go:279\ngithub.com/elastic/cloud-on-k8s/cmd/manager.startOperator\n\t/go/src/github.com/elastic/cloud-on-k8s/cmd/manager/main.go:484\ngithub.com/elastic/cloud-on-k8s/cmd/manager.doRun.func2\n\t/go/src/github.com/elastic/cloud-on-k8s/cmd/manager/main.go:319"
}{
   "log.level":"error",
   "@timestamp":"2021-01-19T13:05:11.888Z",
   "log.logger":"manager",
   "message":"Failed to create controller manager",
   "service.version":"1.3.1+a0a0a212",
   "service.type":"eck",
   "ecs.version":"1.4.0",
   "error":"Get \"https://10.96.0.1:443/api?timeout=1m0s\": dial tcp 10.96.0.1:443: i/o timeout",
   "error.stack_trace":"github.com/elastic/cloud-on-k8s/cmd/manager.startOperator\n\t/go/src/github.com/elastic/cloud-on-k8s/cmd/manager/main.go:486\ngithub.com/elastic/cloud-on-k8s/cmd/manager.doRun.func2\n\t/go/src/github.com/elastic/cloud-on-k8s/cmd/manager/main.go:319"
}{
   "log.level":"error",
   "@timestamp":"2021-01-19T13:05:11.888Z",
   "log.logger":"manager",
   "message":"Shutting down due to error",
   "service.version":"1.3.1+a0a0a212",
   "service.type":"eck",
   "ecs.version":"1.4.0",
   "error":"Get \"https://10.96.0.1:443/api?timeout=1m0s\": dial tcp 10.96.0.1:443: i/o timeout",
   "error.stack_trace":"github.com/elastic/cloud-on-k8s/cmd/manager.doRun\n\t/go/src/github.com/elastic/cloud-on-k8s/cmd/manager/main.go:327\ngithub.com/spf13/cobra.(*Command).execute\n\t/go/pkg/mod/github.com/spf13/cobra@v1.0.0/command.go:842\ngithub.com/spf13/cobra.(*Command).ExecuteC\n\t/go/pkg/mod/github.com/spf13/cobra@v1.0.0/command.go:950\ngithub.com/spf13/cobra.(*Command).Execute\n\t/go/pkg/mod/github.com/spf13/cobra@v1.0.0/command.go:887\nmain.main\n\t/go/src/github.com/elastic/cloud-on-k8s/cmd/main.go:30\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:204"
}
Krishna Chaurasia
  • 8,924
  • 6
  • 22
  • 35
kara
  • 130
  • 1
  • 11
  • Could you provide more information about your environment and setup? You want to deploy Elastic for cloud, which one cloud provider are you using (GCP, AWS, Azure), which version are you using? What do you mean by `reset cluster`? Can you provide your exact steps with configuration or tutorial or configuration YAML which was used to deploy? You didn't see any pod in `pending` or `crashloopbackoff` state in any namespace? Did you get any errors while you wanted to redeploy elastic? – PjoterS Jan 20 '21 at 07:54

1 Answers1

0

Check the IP of services whose IP the application is attempting to reach.

In your example, try $ kubectl get services --all-namespace | grep 10.96.0.1

Most probably its trying to reach the kubernetes service, and this error means its unable to reach the kubernetes service due to a some network restriction.

Potential issues:

  1. There is some network policy needed which allows ingress and egress traffic, docs.
  2. There is some routing table issue due to which traffic is not reachable.

There might be other issues, but most likely the above should give you some idea. Feel free to suggest more possible issues and I'll update it.

Captain Levi
  • 804
  • 7
  • 18
  • If you're somewhat familiar with Kubernetes, then you know: that 10.96.0.1 address: that's your kubernetes API (99.99% of the time, even in very large corporate setups, with weird kube distros: they all use that same SDN subnet for services ... and kube is always the first address allocated, to that IP). The kubernetes ClusterIP Serice address, in default namespace. At that stage, we can't say much: should check on kubernetes API server logs. Which makes this answer a nice mix of uselessness, cluelessness, ... necro-posting at its best. – SYN Mar 22 '23 at 00:14
  • @SYN The problem is not specific to the IP, That's why I mentioned to validate what it's trying to reach. I see no reason for not suggesting problems that might restrict traffic from reaching a cluster ip, especially if someone is not aware of network restrictions on the cluster. It being API Server is part of it, not all of it. – Captain Levi Mar 22 '23 at 16:07