0

I am building a data flow within azure data factory and I would like to apply some GDPR masking rules within the flow.

What I would like to do is the following: In the Derived Column (or other component) I would like to match my input columns with a reference array and for the columns that matches between my input and reference array I would like to replace/mask those values.

Power point over the data flow and what I would like to do

I have tried some IN and regex functions but I have not gotten it yet. Anyone that know how and if this is possible?

Update: I might have got somewhere with the SELECT component. However, there's something that I don't quite get:

Let's say that I have a data flow parameter called ColumnsToMask of the type string[]. I define the variable content as ['a', 'b']. (a and b are two of my input columns.)

In the SELECT component I add a rule based mapping á: in($ColumnsToMask, name) That don't work for some reason. However, this works: in(['a', 'b'], name)

(By work I mean that I get the matching columns added to my output.)

Anyone knows what I am doing wrong setting my parameter?

Update2.5 Changed the text to a picture in an effort to hopefully explain it a bit better:

How come the evaluated expression works but not the expression itself?

So when I use the evaluated expression everything works like I would like it to but when I try the variable that holds the value it for some reason does not work. What should I change?

mattssok
  • 1
  • 1
  • I'm afraid to say no, derived column only support add or modify exist source schema, and the column number are fixed. If you source are dynamic and schema will be different, and you want do same data conversion or schema changes, data factory or data flow can't achieve it. In one word, it's impossible with Data Factory. – Leon Yue May 10 '21 at 08:02
  • What happened when you tried in() ? – Mark Kromer MSFT May 10 '21 at 08:09
  • @MarkKromerMSFT if I use e.g. in($ColumnsToMask, name) I get true which mean that I get all input columns as output if I remember correctly. Maybe I could modify the expression in some smart way? – mattssok May 10 '21 at 08:57
  • If this is not possible through the Derived Column, does anyone know if I could build a regexMatch with the incoming array and SELECT transform the needed columns? – mattssok May 10 '21 at 11:28
  • $mystringparam = ['a','b'] – Mark Kromer MSFT May 10 '21 at 16:30
  • @MarkKromerMSFT when I put $mystringparam = ['a','b'] then it won't validate since it expected a string and received an array. What am I missing here? – mattssok May 10 '21 at 17:03

1 Answers1

0

Your parameter definition would look something like this:

enter image description here

Mark Kromer MSFT
  • 3,578
  • 1
  • 10
  • 11
  • Would this be in my pipeline or in my dataflow? If it is in my data flow, I'm not getting in($parameter1, name) to work. I get 0 matches. (However, I do still get 2 matches with in(['a','b'], name).) I don't really get why these two would generate different results. – mattssok May 11 '21 at 04:39
  • That is in the data flow. Is "name" in this context from the "name" reserved word in a column pattern? – Mark Kromer MSFT May 11 '21 at 06:12
  • name in this instance are all my incoming columns. *Create a boolean expression that matches columns based on the name, type, stream, origin, and position of the column.* (https://learn.microsoft.com/en-us/azure/data-factory/concepts-data-flow-column-pattern) My incoming columns are called a, b and c. – mattssok May 11 '21 at 06:35
  • I have a source with a column name called 'fullname'. I set a parameter in my data flow like this: ['a','b','fullname']. In my derived column, I created a column pattern. My matching rule is: in($parameter1,name). For the column name expression I did this so that I can see the results: $$+'_match'. For value, I just use $$. In data preview, I can see a new column called 'fullname_match' because in() found the column name in my array. – Mark Kromer MSFT May 11 '21 at 06:40
  • Thanks I get that to work. But how come that the exact same syntax in the SELECT transformation does not work/find a match? – mattssok May 11 '21 at 09:20