6

I successfully installed Postgres Debezium CDC. Now, I'm able to catch all changes happening to the database. But the problem is "before" field always stays empty. So, if I insert a record (id = 1, name = Bill) I then get from Kafka this data:

'payload': {'before': None, 'after': {'id': 1, 'name': 'Bill'}, ...

But if I update the record like so:

UPDATE mytable set name = 'Bob' WHERE id = 1

I get this from Kafka:

'payload': {'before': None, 'after': {'id': 1, 'name': 'Bob'}, ...

This is how I configured my connector:

curl -X POST  localhost:8083/connectors/ \
  -H "Accept:application/json" -H "Content-Type:application/json" -d \
'{
    "name": "test-connector",
    "config": {
         "connector.class": "io.debezium.connector.postgresql.PostgresConnector",
         "tasks.max": "1",
         "plugin.name": "pgoutput",
         "database.hostname": "postgres",
         "database.port": "5432",
         "database.user": "postgres",
         "database.password": "postgres",
         "database.dbname" : "test",
         "database.server.name": "postgres",
         "database.whitelist": "public.mytable",
         "database.history.kafka.bootstrap.servers": "kafka:9092",
         "database.history.kafka.topic": "public.topic"
    }
}'

What is wrong with that and how can I fix it?

Iskuskov Alexander
  • 4,077
  • 3
  • 23
  • 38
Jacobian
  • 10,122
  • 29
  • 128
  • 221
  • How do you get messages from Kafka? – Iskuskov Alexander Jan 19 '20 at 22:03
  • @Iskuskov Alexander. I tried consumers in two languages - PHP and Python. They both get messages from Kafka and print their body and in both cases I see that `before` property is None (in Python) or null (in PHP). – Jacobian Jan 20 '20 at 05:53
  • Did you consume from the earliest offset? Are you sure that message `'payload': {'before': None, 'after': {'id': 1, 'name': 'Bob'}` is about update event and not about the insert event that you read again? Please show more data in messages – Iskuskov Alexander Jan 20 '20 at 07:05
  • @Iskuskov Alexander. I'm consuming from the start. In PHP for example, I do it like so `$topic->consumeStart(0, RD_KAFKA_OFFSET_BEGINNING);`. And I'm totally sure, that the message with `name: Bob` is an update event, because it pops up, when I manually execute `UPDATE` statement. – Jacobian Jan 20 '20 at 08:44
  • Beside, `id` field is serial, so it's impossible to have two records with `id = 1` – Jacobian Jan 20 '20 at 08:47
  • Hm. I noticed this in connector logs: `REPLICA IDENTITY for 'registry.schema_version' is 'DEFAULT'; UPDATE and DELETE events will contain previous values only for PK columns`. It seems like this is the whole problem, but I'm not sure how to fix it. – Jacobian Jan 20 '20 at 08:51
  • Try to execute this command: `ALTER TABLE public.mytable REPLICA IDENTITY FULL;` – Iskuskov Alexander Jan 20 '20 at 09:13
  • @Iskuskov Alexander. Yeah, it works! – Jacobian Jan 20 '20 at 09:13
  • It's great! I've moved my comments to the answer – Iskuskov Alexander Jan 20 '20 at 09:25

1 Answers1

10

before is an optional field that if present contains the state of the row before the event occurred. Whether or not this field is available is highly dependent on the REPLICA IDENTITY setting for each table.

REPLICA IDENTITY is a PostgreSQL specific table-level setting which determines the amount of information that is available to logical decoding in case of UPDATE and DELETE events.

To show the previous values of all the table’s columns, please set the REPLICA IDENTITY level to FULL:

ALTER TABLE public.mytable REPLICA IDENTITY FULL;

See more details in the Debezium docs.

Iskuskov Alexander
  • 4,077
  • 3
  • 23
  • 38