0

We have a MySQL/BigQuery stream that replicates data in 0 seconds. Data backfilled with no issues and stream went to Running state as expected.

RDS MySQL source was configured as stated in Google Docs with following configurations binlog_format:ROW, log_bin_use_v1_row_events:1, read_only:0, net_read_timeout:3600, net_write_timeout:3600, wait_timeout:86400

In the source database we have a table that has a OnDelete Cascade constraint. The problem is if any data in that table is deleted and cascade deletion happens, the deletion does not happen in the BigQuery replica dataset.

This results in wrong analysis as data in source has been deleted and calculations in source database gives different figures with what is in BigQuery.

Based on discussion in this mysql site

 The cascading deletes are internal to the InnoDB engine, and as 
 such there is no way that the server can be informed about the 
 fact that additional rows were affected internally. Any engine 
  that creates or destroys additional rows in any tables under 
 its control, for whatever reason, has to inform the server of 
  that occurrence for the rows to be logged.

Apparently, MySQL does not log OnCascade Delete entries in binlog files, so Datastream cannot replicate deletions through OnCascade to BigQuery as it depends on binlog queries.

How can we ensure data in source MySQL database be replicated completely without ignoring deletes in realtime.

Yunus Einsteinium
  • 1,102
  • 4
  • 21
  • 55

0 Answers0