I have a dataframe as below:
+-----------------------------------------------------------------+
|ID|DATASET |
+-----------------------------------------------------------------+
|4A|[{"col.1":"12ABC","col.2":"141","col.3":"","col.4":"ABCD"}] |
|8B|[{"col.1":"12ABC","col.2":"141","col.3":"","col.4":"ABCD"}] |
+-----------------------------------------------------------------+
Expected Output:
+----------------------------------+
|ID|col_1 | col_2 | col_3| col_4 |
+----------------------------------+
|4A|"12ABC"|"141"||"ABCD" |
|8B|"12ABC"|"141"||"ABCD" |
+--------------------------------- +
- i tried to use regex_extract :
df.withColumn("col_1", regexp_extract("DATASET", "(?<=col.1:)\w+(?=(,|}|]))", 0)).withColumn("col_2", regexp_extract("DATASET", "(?<=col.2:)\w+(?=(,|}|]))", 0))
but getting null values in the result
+----------------------------------+
|ID|col_1 | col_2 | col_3| col_4 |
+----------------------------------+
|4A|||| |
|8B|||| |
+--------------------------------- +
Any input on this
Thanks in advance
EDITED
Thanks for the response, that works fine, my input getting changed a little bit, i want to group based on col1 and place values in separate rows
Updated Dataset:
+---------------------------------------------------------------------------------------------------------------------------+
|ID|DATASET |
+---------------------------------------------------------------------------------------------------------------------------+
|4A|[{"col.1":"12ABC","col.2":"141","col.3":"","col.4":"ABCD"},{"col.1":"13ABC","col.2":"141","col.3":"","col.4":"ABCD"}] |
+---------------------------------------------------------------------------------------------------------------------------+
Expected Result:
+-----------------------------------------------------------------+
|ID|col_1 | col_2 | col |
+-----------------------------------------------------------------+
|4A|"12ABC"|"{"col.2":"141","col.3":"","col.4":"ABCD"}" |
|4A|"13ABC"|"{"col.2":"141","col.3":"","col.4":"ABCD"}" |
+-----------------------------------------------------------------+
Thanks in advance