1

I followed the documentation on https://cloud.google.com/data-fusion/docs/tutorials/replicating-data/mysql-to-bigquery to create a Cloud Data Fusion instance and connect to a MySQL replication instance (running mysql 5.7 in replication mode, reading from another main db). Running the replication job results in a bunch of info logs and in the advanced logs I can see 503s from the api like:

Encountered status code 503 when accessing URL https://www.googleapis.com/upload/storage/v1/xxxx. Delegating to response handler for possible retry.

But the main issue I think is that the replication job after a while hangs with:

03/15/2021 15:19:06 INFO Step 8: - 410000 of 6205778 rows scanned from table 'schema.table' after 00:01:22.421
03/15/2021 15:19:27 ERROR Failed to merge a batch of changes from the staging table into schema.table

and then it repeats the error every 30 seconds or so.

From the error log it looks like its an issue already in Big Query, copying rows from the staging table to the actual Big Query table and indeed the staging table has a couple of rows in it but the actual table is empty.

1 Answers1

0

I solved this by looking into entries in Log Explorer, filtering BigQuery and there I found the right hint:

Syntax error: Unexpected keyword HASH at...

So it seems in our source table we have a column named "hash" and this does not get escaped and clashes with Big Querys built-in hash function resulting in a syntax error. Since the hash column isn't important for us to be transferred over into Big Query we simply deselected the column in Data Fusion so that it does get ignored.