Cloud data fusion Parameterized pipeline using argument setter and wrangler

Question

Is there a way to create reusable data fusion pipeline that can handle multiple table transformations.

Example: I have 2 tables in BigQuery dataset in raw format and I would like to create data fusion pipeline and load transformed data in another BigQuery dataset say:

Table 1:

<table>
<thead>
<tr>
<th>ID</th>
<th>first_name</th>
<th>last_name</th>
<th>age</th>
<th>address</th>
<th>salary</th>
</tr>
</thead>
<tbody>
<tr>
<td>e111</td>
<td>Amy</td>
<td>Fowler</td>
<td>34</td>
<td>123 Stevens Ave, Dallas, TX, 75252</td>
<td>105,000</td>
</tr>
<tr>
<td>e222</td>
<td>Leonard</td>
<td>Hoffstadar</td>
<td>32</td>
<td>Apt 213 Stamford Village, Stamford, CT, USA2</td>
<td>70,000</td>
</tr>
</tbody>
</table>

Table 2:

<table>
  <thead>
    <tr>
      <th>Dept_id</th>
      <th>Dept_name</th>
      <th>Supervisor</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>d123</td>
      <td>abc</td>
      <td>Amy</td>
    </tr>
    <tr>
      <td>d234</td>
      <td>xyz</td>
      <td>John</td>
    </tr>
  </tbody>
</table>

Task:

Read table 1 and table 2 from Bigquery Raw Dataset
Apply transformation:

a). Concatenate first_name and last_name in table 1 to create another column with full name.

b). Uppercase dept_name in table 2

Load this transformed data to BigQuery Curated dataset

Now, I would like to create single reusable pipeline that could do the above job by maybe be passing table and field names as argument and the whole process is required to be automated without human input required.

Bigquery argument setter could be used, but I'd like to know if it is possible and how to apply separate transformations to individual tables in single pipeline. Does wrangler support this?

if you are not going to join the tables, why would you prefer having a single pipeline for two different use cases? IIRC, Data Fusion BigQuery block will complain since the schema si different — Marco Massetti, Nov 15 '22 at 21:55
This is just one example, I need to create one pipeline to be able to drive multiple tables to create dim and fact tables in Big query for which I'll be joining tables as well. — user17237211, Nov 16 '22 at 01:12

Cloud data fusion Parameterized pipeline using argument setter and wrangler

0 Answers0