I have a kafka-connect flow of mongodb->kafka connect->elasticsearch sending data end to end OK, but the payload document is JSON encoded. Here's my source mongodb document.
{
"_id": "1541527535911",
"enabled": true,
"price": 15.99,
"style": {
"color": "blue"
},
"tags": [
"shirt",
"summer"
]
}
And here's my mongodb source connector configuration:
{
"name": "redacted",
"config": {
"connector.class": "com.teambition.kafka.connect.mongo.source.MongoSourceConnector",
"databases": "redacted.redacted",
"initial.import": "true",
"topic.prefix": "redacted",
"tasks.max": "8",
"batch.size": "1",
"key.serializer": "org.apache.kafka.common.serialization.StringSerializer",
"value.serializer": "org.apache.kafka.common.serialization.JSONSerializer",
"key.serializer.schemas.enable": false,
"value.serializer.schemas.enable": false,
"compression.type": "none",
"mongo.uri": "mongodb://redacted:27017/redacted",
"analyze.schema": false,
"schema.name": "__unused__",
"transforms": "RenameTopic",
"transforms.RenameTopic.type":
"org.apache.kafka.connect.transforms.RegexRouter",
"transforms.RenameTopic.regex": "redacted.redacted_Redacted",
"transforms.RenameTopic.replacement": "redacted"
}
}
Over in elasticsearch, it ends up looking like this:
{
"_index" : "redacted",
"_type" : "kafka-connect",
"_id" : "{\"schema\":{\"type\":\"string\",\"optional\":true},\"payload\":\"1541527535911\"}",
"_score" : 1.0,
"_source" : {
"ts" : 1541527536,
"inc" : 2,
"id" : "1541527535911",
"database" : "redacted",
"op" : "i",
"object" : "{ \"_id\" : \"1541527535911\", \"price\" : 15.99,
\"enabled\" : true, \"tags\" : [\"shirt\", \"summer\"],
\"style\" : { \"color\" : \"blue\" } }"
}
}
I'd like to do use 2 single message transforms:
ExtractField
to grabobject
, which is a string of JSON- Something to parse that JSON into an object or just let the normal JSONConverter handle it, as long as it ends up as properly structured in elasticsearch.
I've attempted to do it with just ExtractField
in my sink config, but I see this error logged by kafka
kafka-connect_1 | org.apache.kafka.connect.errors.ConnectException:
Bulk request failed: [{"type":"mapper_parsing_exception",
"reason":"failed to parse",
"caused_by":{"type":"not_x_content_exception",
"reason":"Compressor detection can only be called on some xcontent bytes or
compressed xcontent bytes"}}]
Here's my elasticsearch sink connector configuration. In this version, I have things working but I had to code a custom ParseJson SMT. It's working well, but if there's a better way or a way to do this with some combination of built-in stuff (converters, SMTs, whatever works), I'd love to see that.
{
"name": "redacted",
"config": {
"connector.class":
"io.confluent.connect.elasticsearch.ElasticsearchSinkConnector",
"batch.size": 1,
"connection.url": "http://redacted:9200",
"key.converter.schemas.enable": true,
"key.converter": "org.apache.kafka.connect.storage.StringConverter",
"schema.ignore": true,
"tasks.max": "1",
"topics": "redacted",
"transforms": "ExtractFieldPayload,ExtractFieldObject,ParseJson,ReplaceId",
"transforms.ExtractFieldPayload.type": "org.apache.kafka.connect.transforms.ExtractField$Value",
"transforms.ExtractFieldPayload.field": "payload",
"transforms.ExtractFieldObject.type": "org.apache.kafka.connect.transforms.ExtractField$Value",
"transforms.ExtractFieldObject.field": "object",
"transforms.ParseJson.type": "reaction.kafka.connect.transforms.ParseJson",
"transforms.ReplaceId.type": "org.apache.kafka.connect.transforms.ReplaceField$Value",
"transforms.ReplaceId.renames": "_id:id",
"type.name": "kafka-connect",
"value.converter": "org.apache.kafka.connect.json.JsonConverter",
"value.converter.schemas.enable": false
}
}