1

I am trying to land our asset data from various countries (e.g. Spain, Sweden for now) into 1 table using StreamSets.

Considering that they both will have the same identity key, i.e. Spain will have a panel_ID = 1 and so will Sweden. To make my record set unique I will need to add an additional field such as CountryCode. However, this does not sit in our existing data. I will need to manually add this is (hard coded or automate through parameters). How can I achieve this using StreamSets (in the pipeline)?

Also, in general is this approach correct? Am I on the right tracks and what are some other things I should consider?

Shoaib Maroof
  • 369
  • 1
  • 3
  • 13

2 Answers2

0

You should add the source table into the code as an explicit column. That would be something like:

select 'Sweden' as country, s.*
from sweden_data s
union all
select 'Spain' as country, s.*
from spain_data s;

You can save this into a table or just create a view constructed like this.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
0

The table name is available in the jdbc.tables attribute, which you can reference from expression language (EL) as ${record.attribute('jdbc.tables')}. You can use an Expression Evaluator to copy the attribute into a field that can be used in a compound key.

metadaddy
  • 4,234
  • 1
  • 22
  • 46