5

My Main Question: Why is the schema-registry crashing?

Peripheral Question: Why are two pods launching for each of zookeeper/kafka/schema-registry if I've configured one server for each? Does everything thing else look basically right?

➜  helm repo update
<snip>

➜  helm install --values values.yaml --name my-confluent-oss confluentinc/cp-helm-charts
<snip>

➜  helm list
NAME                REVISION    UPDATED                     STATUS      CHART                   APP VERSION NAMESPACE
my-confluent-oss    1           Sat Oct 20 19:09:08 2018    DEPLOYED    cp-helm-charts-0.1.0    1.0         default  

➜  kubectl get pods
NAME                                                   READY     STATUS             RESTARTS   AGE
my-confluent-oss-cp-kafka-0                            2/2       Running            0          20m
my-confluent-oss-cp-schema-registry-59d8877584-c2jc7   1/2       CrashLoopBackOff   7          20m
my-confluent-oss-cp-zookeeper-0                        2/2       Running            0          20m

My values.yaml is as follows. I've tested this out with helm install --debug --dry-run. I'm just disabling persistence, setting a single server (this is a development setup for running in a VM), and disabling the extra services for the moment until I get the basics working:

cp-kafka:
  brokers: 1
  persistence:
    enabled: false

  cp-zookeeper:
    persistence:
      enabled: false
    servers: 1

cp-zookeeper:
  persistence:
    enabled: false
  servers: 1

cp-kafka-connect:
  enabled: false

cp-kafka-rest:
  enabled: false

cp-ksql-server:
  enabled: false

Here are the logs for the failing schema-registry:

➜  kubectl logs my-confluent-oss-cp-schema-registry-59d8877584-c2jc7 cp-schema-registry-server

<snip>
[2018-10-21 00:28:14,738] INFO Kafka version : 2.0.0-cp1 (org.apache.kafka.common.utils.AppInfoParser)
[2018-10-21 00:28:14,738] INFO Kafka commitId : 4b1dd33f255ddd2f (org.apache.kafka.common.utils.AppInfoParser)
[2018-10-21 00:28:14,751] INFO Cluster ID: ofJRwpXzRn-ltDn8b_6h3A (org.apache.kafka.clients.Metadata)
[2018-10-21 00:28:14,753] INFO Initialized last consumed offset to -1 (io.confluent.kafka.schemaregistry.storage.KafkaStoreReaderThread)
[2018-10-21 00:28:14,756] INFO [kafka-store-reader-thread-_schemas]: Starting (io.confluent.kafka.schemaregistry.storage.KafkaStoreReaderThread)
[2018-10-21 00:28:14,800] INFO [Consumer clientId=KafkaStore-reader-_schemas, groupId=my-confluent-oss] Resetting offset for partition _schemas-0 to offset 0. (org.apache.kafka.clients.consumer.internals.Fetcher)
[2018-10-21 00:28:14,821] INFO Cluster ID: ofJRwpXzRn-ltDn8b_6h3A (org.apache.kafka.clients.Metadata)
[2018-10-21 00:28:14,857] INFO Wait to catch up until the offset of the last message at 7 (io.confluent.kafka.schemaregistry.storage.KafkaStore)
[2018-10-21 00:28:14,930] INFO Joining schema registry with Kafka-based coordination (io.confluent.kafka.schemaregistry.storage.KafkaSchemaRegistry)
[2018-10-21 00:28:14,939] INFO Kafka version : 2.0.0-cp1 (org.apache.kafka.common.utils.AppInfoParser)
[2018-10-21 00:28:14,940] INFO Kafka commitId : 4b1dd33f255ddd2f (org.apache.kafka.common.utils.AppInfoParser)
[2018-10-21 00:28:14,953] INFO Cluster ID: ofJRwpXzRn-ltDn8b_6h3A (org.apache.kafka.clients.Metadata)
[2018-10-21 00:29:14,945] ERROR Error starting the schema registry (io.confluent.kafka.schemaregistry.rest.SchemaRegistryRestApplication)
io.confluent.kafka.schemaregistry.exceptions.SchemaRegistryInitializationException: io.confluent.kafka.schemaregistry.exceptions.SchemaRegistryTimeoutException: Timed out waiting for join group to complete
    at io.confluent.kafka.schemaregistry.storage.KafkaSchemaRegistry.init(KafkaSchemaRegistry.java:220)
    at io.confluent.kafka.schemaregistry.rest.SchemaRegistryRestApplication.setupResources(SchemaRegistryRestApplication.java:63)
    at io.confluent.kafka.schemaregistry.rest.SchemaRegistryRestApplication.setupResources(SchemaRegistryRestApplication.java:41)
    at io.confluent.rest.Application.createServer(Application.java:169)
    at io.confluent.kafka.schemaregistry.rest.SchemaRegistryMain.main(SchemaRegistryMain.java:43)
Caused by: io.confluent.kafka.schemaregistry.exceptions.SchemaRegistryTimeoutException: Timed out waiting for join group to complete
    at io.confluent.kafka.schemaregistry.masterelector.kafka.KafkaGroupMasterElector.init(KafkaGroupMasterElector.java:202)
    at io.confluent.kafka.schemaregistry.storage.KafkaSchemaRegistry.init(KafkaSchemaRegistry.java:215)
    ... 4 more
[2018-10-21 00:29:14,948] INFO Shutting down schema registry (io.confluent.kafka.schemaregistry.storage.KafkaSchemaRegistry)
[2018-10-21 00:29:14,949] INFO [kafka-store-reader-thread-_schemas]: Shutting down (io.confluent.kafka.schemaregistry.storage.KafkaStoreReaderThread)
[2018-10-21 00:29:14,950] INFO [kafka-store-reader-thread-_schemas]: Stopped (io.confluent.kafka.schemaregistry.storage.KafkaStoreReaderThread)
[2018-10-21 00:29:14,951] INFO [kafka-store-reader-thread-_schemas]: Shutdown completed (io.confluent.kafka.schemaregistry.storage.KafkaStoreReaderThread)
[2018-10-21 00:29:14,953] INFO KafkaStoreReaderThread shutdown complete. (io.confluent.kafka.schemaregistry.storage.KafkaStoreReaderThread)
[2018-10-21 00:29:14,953] INFO [Producer clientId=producer-1] Closing the Kafka producer with timeoutMillis = 9223372036854775807 ms. (org.apache.kafka.clients.producer.KafkaProducer)
[2018-10-21 00:29:14,959] ERROR Unexpected exception in schema registry group processing thread (io.confluent.kafka.schemaregistry.masterelector.kafka.KafkaGroupMasterElector)
org.apache.kafka.common.errors.WakeupException
    at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.maybeTriggerWakeup(ConsumerNetworkClient.java:498)
    at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:284)
    at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:242)
    at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:233)
    at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.awaitMetadataUpdate(ConsumerNetworkClient.java:161)
    at org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureCoordinatorReady(AbstractCoordinator.java:243)
    at io.confluent.kafka.schemaregistry.masterelector.kafka.SchemaRegistryCoordinator.ensureCoordinatorReady(SchemaRegistryCoordinator.java:207)
    at io.confluent.kafka.schemaregistry.masterelector.kafka.SchemaRegistryCoordinator.poll(SchemaRegistryCoordinator.java:97)
    at io.confluent.kafka.schemaregistry.masterelector.kafka.KafkaGroupMasterElector$1.run(KafkaGroupMasterElector.java:192)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)

I'm using minikube 0.30.0 and a fresh, clean minikube vm:

➜  kubectl version

Client Version: version.Info{Major:"1", Minor:"10", GitVersion:"v1.10.5", GitCommit:"32ac1c9073b132b8ba18aa830f46b77dcceb0723", GitTreeState:"clean", BuildDate:"2018-06-22T05:40:33Z", GoVersion:"go1.9.7", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"10", GitVersion:"v1.10.0", GitCommit:"fc32d2f3698e36b93322a3465f63a14e9f0eaead", GitTreeState:"clean", BuildDate:"2018-03-26T16:44:10Z", GoVersion:"go1.9.3", Compiler:"gc", Platform:"linux/amd64"}
OneCricketeer
  • 179,855
  • 19
  • 132
  • 245
clay
  • 18,138
  • 28
  • 107
  • 192

1 Answers1

5

Your schema registry can't join your Kafka group. You'll have to check the configs, your schema registry needs to perform a leader election initially and that leader election could be either through Zookeeper or Kafka.

Looks like the Helm chart installs the schema registry using Kafka leader election, and you can also see that you can manually pass the Kafka broker parameter or it picks it from .Values.kafka.bootstrapServers, but also the value for .bootstrapServers appears empty. You can see what config value is in your deployment by simply running something like:

$ kubectl get deployment my-confluent-oss-cp-schema-registry -o=yaml

Then you can change it to point the internal Kubernetes my-confluent-oss-cp-kafka service endpoint with:

$ kubectl edit deployment cp-schema-registry

Also, note that as of this writing the cp-helm-charts are in developer preview so use it at your own risk.

The other parameter you can configure is SCHEMA_REGISTRY_KAFKASTORE_INIT_TIMEOUT_CONFIG since this is exactly where you are seeing the error. So the Kafka Schema registry maybe timing out while trying to connect to the Kafka store. (maybe related to minikube). What's kind of odd is that it should retry.

Rico
  • 58,485
  • 12
  • 111
  • 141
  • `kubectl get deployment cp-schema-registry` didn't work. Maybe you meant `kubectl describe deployment my-confluent-oss-cp-schema-registry`? That mentions environment variable: `SCHEMA_REGISTRY_KAFKASTORE_BOOTSTRAP_SERVERS: PLAINTEXT://my-confluent-oss-cp-kafka-headless:9092`. Also, I understand this is a dev preview, and use at your own risk, but I do expect the very basics to startup without crashing. – clay Oct 21 '18 at 17:25
  • Yeah, I meant `my-confluent-oss--cp-schema-registry` changed the answer a bit. The Helm chart prepends what's given to the option `--name` to all deployments, etc. Looks like it basically can't talk to the headless service, on port 9092, it might be an issue with networking within minikube. You can shell to the pod `kubectl exec -it sh` and see if you can connect from there. – Rico Oct 21 '18 at 17:36
  • I also suggested `describe` rather than `get`. Did you mean that too? – clay Oct 21 '18 at 18:14
  • 1
    either one works `get` with `-o=yaml` or just `describe` – Rico Oct 21 '18 at 18:15
  • If I shell into the schema-registry pod, I have to use the jmx metrics container, if I use the cp-schema-registry-server I get errors, presumably because it is crashing. But I can successfully ping `my-confluent-oss-cp-kafka-headless` and get a tcp connection to port 9092 from that pod. – clay Oct 21 '18 at 18:49
  • In the schema-registry pod/container logs, I see `ConsumerConfig values`... `bootstrap.servers = [PLAINTEXT://my-confluent-oss-cp-kafka-headless:9092]` – clay Oct 21 '18 at 18:52
  • Config looks good and connectivity looks good. I updated the answer, looks like it may be timing out before connecting to the Kafka store. – Rico Oct 21 '18 at 22:03
  • 5
    My last comment was wrong and I'm deleting. The mistake is my config I set cp-kafka brokers to 1 but `cp-kafka.configurationOverrides.offsets.topic.replication.factor` was at default of 3 which caused Kafka broker to fail which caused schema-registry to fail. I fixed that and it works fine. Problem was my config. – clay Oct 22 '18 at 22:14
  • 1
    Ahh, cool, yeah if your Kafka broker is down, it won't be able to talk to it :-) – Rico Oct 22 '18 at 22:15
  • @clay Thanks for that. I opened an issue on the repo: https://github.com/confluentinc/cp-helm-charts/issues/236 – Omer van Kloeten Jan 30 '19 at 11:47
  • 1
    Thanks @clay - fixed my problem too! – Alessandro Santini Nov 02 '20 at 22:05