Kafka Connect from RDS to RedShift not starting

Question

I was able to implement Kafka Connect on a much smaller table but am trying to implement it on a larger database. My source and sink configuration are as followed

source:

name=rds-source
connector.class=io.confluent.connect.jdbc.JdbcSourceConnector
table.whitelist=users,places,sales
tasks.max=1
connection.url=jdbc:postgresql://my-rds-source-url/db?user=<USERNAME>&password=<PASSWORD>
mode=timestamp+incrementing
timestamp.column.name=updated_at
incrementing.column.name=id
topic.prefix=rds_

sink:

name=redshift-sink
connector.class=io.confluent.connect.jdbc.JdbcSinkConnector
tasks.max=1
topics=rds_test
connection.url=jdbc:redshift://my-redshift-url:5439/db?user=<USERNAME>&password=<PASSWORD>
auto.create=true

Then I downloaded the latest Redshift Drivers from here and placed it inside /usr/local/confluent/share/java/kafka-connect-jdbc/

Then I initialize the Confluent platform, created the topic, and run the consumer with the following commands:

confluent start
kafka-topics --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic rds_test
kafka-avro-console-consumer --bootstrap-server localhost:9092 --property schema.registry.url=http://localhost:8081 --property print.key=true --topic rds_test --from-beginning

Rather than getting an error, I just get a blank screen with no feedback. My assumption is that Kafka will have to scan the tables and I see this weird query in my pg_stat_activity

SELECT * FROM "users" WHERE "updated_at" < $1 AND (("updated_at" = $2 AND "id" > $3) OR "updated_at" > $4) ORDER BY "updated_at","id" ASC

So I'm assuming there is an issue with Kafka querying new entries, or that this query returns something that caused a serialization error. I'm not sure if there is anything else I am missing.

Kafka Connect from RDS to RedShift not starting

0 Answers0