0

I'm working on a transformation and stuck with a common problem. Any assist is well appreciated.

Scenario: Step-1: Reading from a delta table.

+--------+------------------+
| emp_id |     str          |
+--------+------------------+
|   1    | name=qwerty.     |
|   2    | age=22           | 
|   3    | job=googling     | 
|   4    | dob=12-Jan-2001  | 
|   5    | weight=62.7.     | 
+--------+------------------+

Step-2: I'm refining the data and outputting it into another delta table dynamically (No predefined schema). Let's say I'm adding null if the column name is not found.

+--------+--------+------+----------+-------------+--------+
| emp_id |  name  |  age |    job   |     dob     | weight |
+--------+--------+------+----------+-------------+--------+
|   1    | qwerty | null |   null   |     null    |  null  |
|   2    |  null  |  22  |   null   |     null    |  null  | 
|   3    |  null  | null | googling |     null    |  null  | 
|   4    |  null  | null |   null   | 12-Jan-2001 |  null  | 
|   5    |  null  | null |   null   |     null    |  62.7  | 
+--------+--------+------+----------+-------------+--------+

Is there a way to apply validation in step-2 based on the column name? I'm splitting it by = while deriving the above table. Or do I have to do validation in step-3 while working on the new df?

Second question: Is there a way to achieve the following table?

+--------+--------+------+----------+-------------+--------+---------------------+
| emp_id |  name  |  age |    job   |     dob     | weight | missing_attributes  |
+--------+--------+------+----------+-------------+--------+---------------------+
|   1    | qwerty | null |   null   |     null    |  null  | age,job,dob,weight  |
|   2    |  null  |  22  |   null   |     null    |  null  | name,job,dob,weight | 
|   3    |  null  | null | googling |     null    |  null  | name,age,dob,weight |
|   4    |  null  | null |   null   | 12-Jan-2001 |  null  | name,age,job,weight |
|   5    |  null  | null |   null   |     null    |  62.7  | name,age,job,dob    |
+--------+--------+------+----------+-------------+--------+---------------------+
boring-coder
  • 63
  • 1
  • 5
  • you can use constraints for the same kind of validation and similary for your second questions : Case statement would help to achieve the same. – sai saran Feb 10 '23 at 10:40
  • This won't work if I'm creating the table dynamically because I won't be defining any schema. – boring-coder Feb 21 '23 at 09:30

0 Answers0