We have a MySQL/BigQuery stream that replicates data in 0 seconds. Data backfilled with no issues and stream went to Running
state as expected.
RDS MySQL source was configured as stated in Google Docs with following configurations
binlog_format:ROW, log_bin_use_v1_row_events:1, read_only:0, net_read_timeout:3600, net_write_timeout:3600, wait_timeout:86400
In the source database we have a table that has a OnDelete Cascade
constraint. The problem is if any data in that table is deleted and cascade deletion happens, the deletion does not happen in the BigQuery
replica dataset.
This results in wrong analysis as data in source has been deleted and calculations in source database gives different figures with what is in BigQuery.
Based on discussion in this mysql site
The cascading deletes are internal to the InnoDB engine, and as
such there is no way that the server can be informed about the
fact that additional rows were affected internally. Any engine
that creates or destroys additional rows in any tables under
its control, for whatever reason, has to inform the server of
that occurrence for the rows to be logged.
Apparently, MySQL does not log OnCascade Delete entries in binlog files, so Datastream cannot replicate deletions through OnCascade to BigQuery as it depends on binlog queries.
How can we ensure data in source MySQL database be replicated completely without ignoring deletes in realtime.