I'm currently in the process of integrating MongoDB database into BigQuery through Data Fusion and I'm facing an issue with array objects. It seems like Data Fusion doesn't understand or support such data types. However it seems to be a feature that I could use and here's where I get confused. Let me give you an example based on one row of data.
Order_number, items
43324 [{"name":"x","price":"10"},{"name":"y","price":"20"}]
From what I've been testing, the only thing that I can do (that works) is to flatten items, then parse it as json and achieve such result:
Order_number, items_name, items_price
43324 x 10
43324 y 20
However, I'm trying to avoid duplicating the order number and I want to leave items as array to have this kind of result in one row:
Order_number, items_name, items_price
43324 x 10
y 20
Here's my workflow in Data Fusion
And here's my output schema generated in wrangler (I've added only name for testing purposes):
I propagated this schema into BigQuery and here's an error that I'm getting:
Spark program 'phase-1' failed with error: null.
So clearly, there's an option to work with arrays in Data Fusion but something seems to be off and I don't know what. I've tested every single possiblity or combination and I wasn't able to achieve any positive result.
The alternative approach that gave any reasonable error was this:
And this returns the following error:
Spark program 'phase-1' failed with error: Unable to decode array 'items'
Maybe I need to add some directives or another plugin in Data Fusion but honestly I'm running out of ideas. I'd aprreciate any suggestions on the matter.