0

We have a running KafkaConnect cluster (Strimzi distribution), deployed in an Openshift (Kubernetes for the matter) cluster that is showing an erratic behaviour.

  • The REST API of Kafka Connect is randomly slow, very slow for some endpoints, even when the cluster is not under heavy load
    • Query a connector
    • Delete a connector
    • Create a connector
  • But always work perfectly when querying the list of connectors
  • Connectors and tasks appear as UNASSIGNED but the logs show them running when we query the list of connectors in the cluster
    • /connectors?expand=info&expand=status

We have checked the communication between workers

There are 5 workers in the cluster, each one consuming about 12Gb of RAM and 1.5 cores There are 1000 Connectors running in the cluster, all of them CloudantSourceConnector (Cloudant is a CouchDB implementation by IBM)

Is it normal that amount of consumed resources?

What could be causing the REST API timeouts?

Thanks a lot.

Cluster configuration

      version: 2.6.0

      replicas: 5

      config:
        group.id: prod-cluster-group

        config.storage.replication.factor: 3
        config.storage.topic: prod-cluster-configs

        key.converter.schemas.enable: false
        key.converter: org.apache.kafka.connect.json.JsonConverter

        max.poll.interval.ms: 600000
        max.poll.records: 10

        offset.storage.replication.factor: 3
        offset.storage.topic: prod-cluster-offsets
        status.storage.replication.factor: 3

        status.storage.topic: prod-cluster-status
        value.converter.schemas.enable: false
        value.converter: org.apache.kafka.connect.json.JsonConverter

Connector configuration

Each connector has this configuration, all of them write to the same topic, and read from a different Cloudant database

"config": {
    "connector.class": "com.ibm.cloudant.kafka.connect.CloudantSourceConnector",
    "cloudant.omit.design.docs": "true",
    "cloudant.db.username": "__REDACTED__",
    "topics": "prod-topic",
    "cloudant.db.password": "__REDACTED__",
    "connection.timeout.ms": "5000",
    "cloudant.value.schema.struct": "true",
    "name": "connector-0001", // (0000...1000)
    "read.timeout.ms": "5000",
    "cloudant.db.url": "__REDACTED__"
}

UNASSIGNED Connector running in the cluster

"status": {
    "name": "connector-0001",
    "connector": {
        "state": "UNASSIGNED",
    },
    "tasks": [
        {
            "id": 0,
            "state": "UNASSIGNED",
        }
    ],
    "type": "source"
}

Cluster log showing the task getting records from Cloudant

(com.ibm.cloudant.kafka.connect.CloudantSourceTask) [task-thread-connector-0001-0]
2022-09-29 08:38:56,624 INFO Return 4 records with last offset ...
jmoreno
  • 101
  • 1

0 Answers0