I have some tables in which I need to mask some of its columns. Columns to be masked vary from table to table and I am reading those columns from application.conf
file.
For example, for employee table as shown below
+----+------+-----+---------+
| id | name | age | address |
+----+------+-----+---------+
| 1 | abcd | 21 | India |
+----+------+-----+---------+
| 2 | qazx | 42 | Germany |
+----+------+-----+---------+
if we want to mask name and age columns then I get these columns in an sequence.
val mask = Seq("name", "age")
Expected values after masking are:
+----+----------------+----------------+---------+
| id | name | age | address |
+----+----------------+----------------+---------+
| 1 | *** Masked *** | *** Masked *** | India |
+----+----------------+----------------+---------+
| 2 | *** Masked *** | *** Masked *** | Germany |
+----+----------------+----------------+---------+
If I have employee table an data frame, then what is the way to mask these columns?
If I have payment
table as shown below and want to mask name
and salary
columns then I get mask columns in Sequence as
+----+------+--------+----------+
| id | name | salary | tax_code |
+----+------+--------+----------+
| 1 | abcd | 12345 | KT10 |
+----+------+--------+----------+
| 2 | qazx | 98765 | AD12d |
+----+------+--------+----------+
val mask = Seq("name", "salary")
I tried something like this mask.foreach(c => base.withColumn(c, regexp_replace(col(c), "^.*?$", "*** Masked ***" ) ) )
but it did not returned anything.
Thanks to @philantrovert, I found out the solution. Here is the solution I used:
def maskData(base: DataFrame, maskColumns: Seq[String]) = {
val maskExpr = base.columns.map { col => if(maskColumns.contains(col)) s"'*** Masked ***' as ${col}" else col }
base.selectExpr(maskExpr: _*)
}