I am setting up a producer that sends messages as a (key value) [key
is a generated unique string
, value
is a json
payload ] to kafka topics (v1.0.0) that are pulled by a kafka connect(v5.3.1) which is then sent to an Elastic search container(v 7.1).
The kafka connect is configured to look at ES for an index with the topic name (the index is already mapped on ES with a schema) and uses kafka key
as the unique id(_id) for every document inserted into the index. Once producer puts content into the kafka topic, it has to be pulled by connect and sent to the ES.
The kafka connect (5.3.1) needs the value sent to it from kafka topic to be of the form as below, in order to map it to elastic search index
{
"schema": {es_schema },
"payload":{ es_payload }
}
My producer is only able to send
{
es_payload
}
I am using docker/docker-compose containers to simulate this locally
I have the producer being able to send to kafka and gets picked up by kafka connect but fails on sending to elastic stating that schema not found on payload
My configuration for kafka connect sink
curl -X POST \
http://localhost:8083/connectors/ \
-H 'Content-Type: application/json' \
-d '{
"name": "elasticsearch-sink",
"config": {
"connector.class": "io.confluent.connect.elasticsearch.ElasticsearchSinkConnector",
"tasks.max": "1",
"topics": "adn-kafka",
"key.ignore": "false",
"schema.ignore": "false",
"connection.url": "http://elasticsearch:9200",
"type.name": "",
"name": "elasticsearch-sink",
"value.converter.schemas.enable": "false",
"key.converter.schemas.enable":"false"
}
}'
The error I get
Caused by: org.apache.kafka.connect.errors.DataException: JsonConverter with schemas.enable requires "schema" and "payload" fields and may not contain additional fields. If you are trying to deserialize plain JSON data, set schemas.enable=false in your converter configuration.
at org.apache.kafka.connect.json.JsonConverter.toConnectData(JsonConverter.java:338)
at org.apache.kafka.connect.runtime.WorkerSinkTask.lambda$convertAndTransformRecord$0(WorkerSinkTask.java:510)
at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndRetry(RetryWithToleranceOperator.java:128)
at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndHandleError(RetryWithToleranceOperator.java:162)
... 13 more
if I set schema.ignore : true
, it does not look for an index with the schema and i don think thats the right way cause my index is already mapped and i dont want kafka connect to send to create a new index
my docker compose
version: '3'
services:
zookeeper:
container_name : zookeeper
image: zookeeper
ports:
- 2181:2181
- 2888:2888
- 3888:3888
kafka:
container_name : kafka
image: bitnami/kafka:1.0.0-r5
depends_on:
- zookeeper
ports:
- "9092:9092"
environment:
KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181
KAFKA_BROKER_ID: "42"
KAFKA_ADVERTISED_HOST_NAME: "kafka"
ALLOW_PLAINTEXT_LISTENER: "yes"
elasticsearch:
container_name : elasticsearch
image:
elasticsearch:7.1.1
ulimits:
memlock:
soft: -1
hard: -1
nofile:
soft: 65536
hard: 65536
environment:
- cluster.name=docker-cluster
- node.name=node1
- bootstrap.memory_lock=true
- "ES_JAVA_OPTS=-Xms4g -Xmx4g"
- discovery.type=single-node
ports:
- "9400:9200"
- "9500:9300"
deploy:
resources:
limits:
memory: 6G
reservations:
memory: 6G
kibana:
container_name : kibana
image: docker.elastic.co/kibana/kibana:7.1.1
# environment:
# - SERVER_NAME=Local kibana
# - SERVER_HOST=0.0.0.0
# - ELASTICSEARCH_URL=elasticsearch:9400
ports:
- "5601:5601"
depends_on:
- elasticsearch
kafka-connect:
container_name : kafka-connect
image: confluentinc/cp-kafka-connect:5.3.1
ports:
- 8083:8083
depends_on:
- zookeeper
- kafka
volumes:
- $PWD/connect-plugins:/connect-plugins
environment:
CONNECT_BOOTSTRAP_SERVERS: "kafka:9092"
CONNECT_REST_PORT: 8083
CONNECT_GROUP_ID: kafka-connect
CONNECT_CONFIG_STORAGE_TOPIC: docker-kafka-connect-configs
CONNECT_OFFSET_STORAGE_TOPIC: docker-kafka-connect-offsets
CONNECT_STATUS_STORAGE_TOPIC: docker-kafka-connect-status
CONNECT_KEY_CONVERTER: "org.apache.kafka.connect.storage.StringConverter"
CONNECT_VALUE_CONVERTER: "org.apache.kafka.connect.json.JsonConverter"
CONNECT_INTERNAL_KEY_CONVERTER: "org.apache.kafka.connect.json.JsonConverter"
CONNECT_INTERNAL_VALUE_CONVERTER: "org.apache.kafka.connect.json.JsonConverter"
CONNECT_KEY_CONVERTER-SCHEMAS_ENABLE: "false"
CONNECT_VALUE_CONVERTER-SCHEMAS_ENABLE: "false"
CONNECT_REST_ADVERTISED_HOST_NAME: "kafka-connect"
CONNECT_LOG4J_ROOT_LOGLEVEL: "INFO"
CONNECT_LOG4J_LOGGERS: "org.apache.kafka.connect.runtime.rest=WARN,org.reflections=ERROR"
CONNECT_CONFIG_STORAGE_REPLICATION_FACTOR: "1"
CONNECT_OFFSET_STORAGE_REPLICATION_FACTOR: "1"
CONNECT_STATUS_STORAGE_REPLICATION_FACTOR: "1"
CONNECT_PLUGIN_PATH: '/usr/share/java'
# Interceptor config
CONNECT_PRODUCER_INTERCEPTOR_CLASSES: "io.confluent.monitoring.clients.interceptor.MonitoringProducerInterceptor"
CONNECT_CONSUMER_INTERCEPTOR_CLASSES: "io.confluent.monitoring.clients.interceptor.MonitoringConsumerInterceptor"
CLASSPATH: /usr/share/java/monitoring-interceptors/monitoring-interceptors-5.3.1.jar
kafka topic name : test-kafka
Es index : test-kafka
ES mapping
{
"mappings":{
"properties" :{
"ppid":{
"type":"long"
},
"field1":{
"type":"long"
},
"field2":{
"type":"long"
},
"time1":{
"type":"date",
"format":"yyyy-MM-dd HH:mm:ss"
},
"time2":{
"type":"date",
"format":"yyyy-MM-dd HH:mm:ss"
},
"status":{
"type":"keyword"
},
"field3":{
"type":"integer"
},
"field4":{
"type":"integer"
}
}
}
}
payload being sent to kafka topic
{ "ppid" : 1, "field1":2 , "field2":1,"time1":"2019-09-25 07:36:48", "time2":"2019-09-25 07:36:48", "status":"SUCCESS", "field3":30,"field4":16}