1

New topic testmd is created

$KAFKA_HOME/bin/kafka-topics.sh --create --topic **testmd** --replication-factor 3 --partitions 3  --zookeeper rhes75:2181,rhes564:2181,rhes76:2181
Created topic testmd.

This is the content of the standalone file


 cat etc/connect-standalone.properties
bootstrap.servers=rhes75:9092,rhes75:9093,rhes75:9094,rhes564:9092,rhes564:9093,rhes564:9094,rhes76:9092,rhes76:9093,rhes76:9094
key.converter=org.apache.kafka.connect.storage.StringConverter
#key.converter=org.apache.kafka.connect.json.JsonConverter
key.converter.schemas.enable=false
value.converter=org.apache.kafka.connect.json.JsonConverter
value.converter.schemas.enable=true
offset.storage.file.filename=/tmp/connect_bq.offsets
offset.flush.interval.ms=10000
plugin.path=/d4T/hduser/bigquery-kafka-connect-sink/share/kafka/plugins

This is the content of sink property file

name=bigquery-sink
connector.type=bigquery-connector
connector.class=com.wepay.kafka.connect.bigquery.BigQuerySinkConnector
defaultDataset=test
project=project_name
topics=testmd
autoCreateTables=false
gcsBucketName=tmp_storage_bucket
queueSize=-1
bigQueryRetry=0
bigQueryRetryWait=1000
bigQueryMessageTimePartitioning=false
bigQueryPartitionDecorator=true
timePartitioningType=DAY
keySource=FILE
keyfile=xyz.json
sanitizeTopics=false
schemaRetriever=com.wepay.kafka.connect.bigquery.retrieve.IdentitySchemaRetriever
threadPoolSize=10
allBQFieldsNullable=false
avroDataCacheSize=100
batchLoadIntervalSec=120
convertDoubleSpecialValues=false
enableBatchLoad=false
upsertEnabled=false
deleteEnabled=false
mergeIntervalMs=60000
mergeRecordsThreshold=-1
autoCreateBucket=true
allowNewBigQueryFields=false
allowBigQueryRequiredFieldRelaxation=false
allowSchemaUnionization=false
kafkaDataFieldName=null
kafkaKeyFieldName=null

Created same random test data feeding it into topic testmd now

$KAFKA_HOME/bin/kafka-console-consumer.sh --bootstrap-server $bootstrapservers --from-beginning --topic testmd --property print.key=true
4b1201a3-a12c-429a-84ab-a56625c42410     {"schema": { "type": "struct", "fields": [ { "field": "rowkey", "type": "string", "optional": true}],"optional": false,"name": "test.md"}, "payload": {"rowkey": "4b1201a3-a12c-429a-84ab-a56625c42410"}}

Again the same error running connector

com.wepay.kafka.connect.bigquery.exception.ConversionConnectException: Top-level Kafka Connect schema must be of type 'struct'

And this the run of kafka-consumer-groups.sh


/d4T/hduser/bigquery-kafka-connect-sink> 5:9092,rhes75:9093,rhes75:9094,rhes564:9092,rhes564:9093,rhes564:9094,rhes76:9092,rhes76:9093,rhes76:9094 --describe --all-groups                                      <

Consumer group 'connect-bigquery-sink' has no active members.

GROUP                  TOPIC           PARTITION  CURRENT-OFFSET  LOG-END-OFFSET  LAG             CONSUMER-ID                                                            HOST            CLIENT-ID
console-consumer-24314 testmd          0          -               0               -               consumer-console-consumer-24314-1-2fb0573c-4469-41af-9e6c-3b32bc585abb /50.140.197.220 consumer-console-consumer-24314-1
console-consumer-24314 testmd          1          -               1               -               consumer-console-consumer-24314-1-2fb0573c-4469-41af-9e6c-3b32bc585abb /50.140.197.220 consumer-console-consumer-24314-1
console-consumer-24314 testmd          2          -               0               -               consumer-console-consumer-24314-1-2fb0573c-4469-41af-9e6c-3b32bc585abb /50.140.197.220 consumer-console-consumer-24314-1
OneCricketeer
  • 179,855
  • 19
  • 132
  • 245
Mich Talebzadeh
  • 117
  • 2
  • 12
  • Hi Any help/suggestion on this please? Thanks – Mich Talebzadeh Mar 18 '21 at 22:11
  • You could also download the connector sources and setup debugging https://stackoverflow.com/questions/45717658/what-is-a-simple-effective-way-to-debug-custom-kafka-connectors – OneCricketeer Mar 19 '21 at 15:38
  • Thanks for comments. At the moment I generate one row of data to pass to Kafka. I have simplified it so that only one column is sent. It is described under section after updated 19/03/2021@08:00AM London time. Hope that answers your query for now. Only one topic called "md". b05ffd95-022e-4382-9ff0-e4404bafe94d: {"schema": { "type": "struct", "fields": [ { "field": "rowkey", "type": "string", "optional": true}],"optional": false,"name": "test.md"}, "payload": {"rowkey": "b05ffd95-022e-4382-9ff0-e4404bafe94d"}} – Mich Talebzadeh Mar 19 '21 at 16:02
  • Sure $KAFKA_HOME/bin/kafka-console-consumer.sh --bootstrap-server rhes75:9092,rhes75:9093,rhes75:9094,rhes564:9092,rhes564:9093,rhes564:9094,rhes76:9092,rhes76:9093,rhes76:9094 --from-beginning --topic md --property print.key=true 5bf14ead-daec-41df-a382-5220489d60bb {"schema": { "type": "struct", "fields": [ { "field": "rowkey", "type": "string", "optional": true}],"optional": false,"name": "test.md"}, "payload": {"rowkey": "5bf14ead-daec-41df-a382-5220489d60bb"}} – Mich Talebzadeh Mar 19 '21 at 17:24
  • @OneCriketter when left the connector running, this error comes.ERROR Found blob tmp_storage_bucket_kafka/tmp/ with no metadata. (com.wepay.kafka.connect.bigquery.GCSToBQLoadRunnable:150). FYI tmp_storage_bucket_kafka/tmp/ is the bucket folder in Google cloud that i believe the topic data lands before inserting the row into BigQuery table. – Mich Talebzadeh Mar 19 '21 at 19:35
  • Added the output of $KAFKA_HOME/bin/kafka-consumer-groups.sh --bootstrap-server ${bootstrapservers} --describe --all-groups to the main body – Mich Talebzadeh Mar 20 '21 at 15:04
  • Okay, according to that, your md topic has millions of records, not just the one you've shown (and there's no group for your sink connector), which means that it's likely failing on the very first record, which has no Schema/Struct information, and producing new data into the same topic isn't going to fix that... Can you try a brand new topic? – OneCricketeer Mar 20 '21 at 15:10
  • new topic testmd created. please see the body thanks – Mich Talebzadeh Mar 20 '21 at 16:22
  • Appreciate any update on this. Thanks – Mich Talebzadeh Mar 22 '21 at 11:42
  • I don't have any suggestions (other than try Avro) since I don't have a BQ environment to test against. The error is internal to the connector code, which is open source, so either setup a debugger based on the comment I gave above, or look at the full stacktrace to see where the issue happens – OneCricketeer Mar 22 '21 at 14:50
  • Thanks. What do you think is the likely cause for such error? You mentioned consumer groups before. – Mich Talebzadeh Mar 22 '21 at 14:53
  • I only wanted to see the groups to know if you had consumed any records at all, and it got stuck on some offset in the middle of the topic that didn't have a schema – OneCricketeer Mar 22 '21 at 14:56
  • Seems you found this issue (although that repo is deprecated), so maybe JSON just doesn't work https://github.com/wepay/kafka-connect-bigquery/issues/178 – OneCricketeer Mar 22 '21 at 14:57

0 Answers0