Dict2Columns - PySpark

Question

I would like to convert one columns with dict values to expand columns with values as follows:

+-------+--------------------------------------------+
|    Idx|                value                       |
+-------+--------------------------------------------+
|    123|{'country_code': 'gb','postal_area': 'CR'}  |
|    456|{'country_code': 'cn','postal_area': 'RS'}  |
|    789|{'country_code': 'cl','postal_area': 'QS'}  |
+-------+--------------------------------------------+

then i would like to get something like this:

display(df)

+-------+-------------------------------+
|    Idx|  country_code | postal_area   |
+-------+-------------------------------+
|    123| gb            | CR            |
|    456| cn            | RS            |
|    789| cl            | QS            |
+-------+-------------------------------+

i just Try to do only for one line something like this:

#PySpark code
sc = spark.sparkContext
dict_lst = {'country_code': 'gb','postal_area': 'CR'}
rdd = sc.parallelize([json.dumps(dict_lst)])
df = spark.read.json(rdd)
display(df)

and i've got:

+-------------+-------------+
|country_code | postal_area |
+-------------+-------------+
|    bg       |    CR       |
+-------------+-------------+

so, here maybe i have part of the solution. now i would like to know hoy can i concat df with dataframe Result

Possible duplicate of [Pyspark: explode json in column to multiple columns](https://stackoverflow.com/questions/51070251/pyspark-explode-json-in-column-to-multiple-columns) — pault, Aug 02 '19 at 13:22

score 0 · Answer 1 · answered Aug 02 '19 at 12:48

well after Trying... the best solution is getting values from regexp_extract function from PySpark:

from pyspark.sql.functions import regexp_extract

df.withColumn("country_code", regexp_extract('value', "(?<=.country_code.:\s.)(.*?)(?=\')", 0)).withColumn("postal_area", regexp_extract('value', "(?<=.postal_area.:\s.)(.*?)(?=\')", 0))

hope this helps for futures askings about getting values from a String Dictionary

Dict2Columns - PySpark

1 Answers1