1

Produce to/Consume from Kafka in JSON. Save to HDFS in JSON using below properties :

key.converter=org.apache.kafka.connect.json.JsonConverter
value.converter=org.apache.kafka.connect.json.JsonConverter
key.converter.schemas.enable=false
value.converter.schemas.enable=false

Producer :

curl -X POST -H "Content-Type: application/vnd.kafka.json.v1+json" \
      --data '{"schema": {"type": "boolean", "optional": false, "name": "bool", "version": 2, "doc": "the documentation", "parameters": {"foo": "bar" }}, "payload": true }' "http://localhost:8082/topics/test_hdfs_json"

Consumer :

./bin/connect-standalone etc/schema-registry/connect-avro-standalone.properties etc/kafka-connect-hdfs/quickstart-hdfs.properties

Issue-1:

key.converter.schemas.enable=true

value.converter.schemas.enable=true

Getting Exception:

org.apache.kafka.connect.errors.DataException: JsonDeserializer with schemas.enable requires "schema" and "payload" fields and may not contain additional fields
    at org.apache.kafka.connect.json.JsonConverter.toConnectData(JsonConverter.java:332)

Issue-2:

Enabling above two properties is not throwing any issue, but no data are written over hdfs.

Any suggestion will be highly appreciated.

Thanks

Giorgos Myrianthous
  • 36,235
  • 20
  • 134
  • 156
Pratim Ghosh
  • 21
  • 1
  • 4

2 Answers2

2

The converter refers to how the data will be translated from the Kafka topic to be interpreted by the connector and written to HDFS. The HDFS connector only supports writing to HDFS in avro or parquet out of the box. You can find the information on how to extend the format to JSON here. If you make such an extension I encourage you to contribute it to the open source project for the connector.

dawsaw
  • 2,283
  • 13
  • 10
  • Thanks for your suggestion! – Pratim Ghosh Dec 12 '16 at 09:47
  • @dawsaw Do you know if such an extension is achievable using native kafka connect api? – TheRealJimShady Oct 13 '17 at 13:04
  • There is a JsonConverter that already ships with Kafka. I think the question here is specific to an output format for the HDFS connector, which necessarily means extending the connector, not doing anything natively with Connect itself if I have understood your question properly. – dawsaw Oct 15 '17 at 02:15
1

For input Json format messages to be written into HDFS, please set below properties

key.converter=org.apache.kafka.connect.storage.StringConverter
value.converter=org.apache.kafka.connect.storage.StringConverter
key.converter.schemas.enable=false
value.converter.schemas.enable=false
akshat thakar
  • 1,445
  • 21
  • 29