I want to schedule a pipeline that transfers MySql data to Google Bigquery, but the complete data gets appended to the old table in BQ, I want only the incremental data to be appended...
Asked
Active
Viewed 757 times
3
-
Did you solve this issue, I need some references for Oracle as a Source. I am clear about data upsert at BigQuery but I need a reference, how we are tracking the watermark column for an incremental load? – Nikhil Suthar Nov 17 '22 at 05:42
2 Answers
2
You need an ability to 'update if present' schematic which is not currently available in BQ plugin. We are working on updating the plugin to be able to support that (opened a JIRA https://issues.cask.co/browse/CDAP-15298). Alternatively, you can write it to a separate table every run and then use 'merge' to update the primary table. Opened a JIRA to add a merge Action plugin that allows one to merge table (https://issues.cask.co/browse/CDAP-15297)

Nitin Motgi
- 229
- 1
- 3
-
How can we write only mysql incremental data to a separate table unless we maintain the indexes manually (if we don't it is going to pull the complete table again which we don't want, or is there a way around it?). Maintaining the indexes is not possible if there are a large amount of tables to be updated everyday? If we could keep the track for it, we can just select rows beyond that index and append to the BQ table, which again we don't want to do. – Amit Kaushik Apr 25 '19 at 09:44
0
Is it possible to calculate the diff to import every time based on a column in your data?

Ali Anwar
- 431
- 2
- 8
-
I haven't found anything regarding this. There is no way in data fusion where we can use or track sink data to the next scheduled job source data... – Amit Kaushik Apr 25 '19 at 09:38