I am new to BigQuery, and would appreciate your help. I am trying to keep my case generic so that it can be useful to others. I'm sure it's pretty basic and I am missing something...
I have 2 datasets, "Master" and "New". Master contains several columns, and already has lots of rows. I want to merge a new dataset into the master list, but the entries from this new dataset may or may not have values in all the columns. In addition, the values in the new list may or may not be already present in the master list. There is one unique / primary key in both (e.g. Col1) For example:
Master:
Col1 | Col2 | Col3 | Col4
Val11 | Val12 | Val13 | Val14
Val21 | Val22 | Val23 | Val24
Val31 | Val32 | Val33 | Val34
New:
Col1 | Col5 | Col6
Val11 | Val15 | null
Val41 | Val45 | Val46
Here are my questions:
1) what bigquery would I have to write in order to merge the two datasets, with the following requirements: a) values should be merged in records, e.g. when the previous record in master had 'null' somewhere, and the new dataset has a value for it, then it should appear there. b) if there is a 'conflict' then the old value should be overwritten by the new value
This is basically just a full outer join, I think, but BigQuery only supports Inner and Left Outer joins...
2) in case it is possible, using some more nested logic, I would like to be able to do 2b) in a more refined way, such as "if there is a conflict, and the original value is X, then overwrite, else don't overwrite/keep the origial value for that one column (other values for that record would be unaffected).
3) in case it is possible, using some more nested logic, I would like to be able to say something like "if val34 and val36 are both TRUE, then set val38 to TRUE as well, else if val37 is X then set val38 to X"
Now, these are 3 different but related questions, if you can help me on either, it would be really great. Btw, I am using bigquery, because the datasets are too large for a normal DB to handle, and because it is a very convenient tool that I can easily share within the entire team.
THANKS A LOT for your help and pointers!
p.s. here is the BigQuery Documentation https://developers.google.com/bigquery/query-reference