1

I have json data like

{
"labels1":
     {"A":1,"B":2, "C":3},
"labels2":
     {"A":1,"B":2, "C":3},
}

and I want 3 output columns that say tagname, keyname,value. The final output will be like

tagname,key,value
labels1,A,1
labels1,B,2
labels1,C,3
labels2,A,1
labels2,B,2
labels2,C,3

How can I achieve this usecase, also the keys A,B,C are just sample and there can be multiple optional fields. Thanks in advance and let me know if any more information is required.

1 Answers1

0

Try with inbuilt pyspark functions for this case like stack and unnest the struct to add as new columns.

Example:

from pyspark.sql.functions import *
json = """{"labels1":{"A":1,"B":2, "C":3},"labels2":{"A":1,"B":2, "C":3}}"""
df = spark.read.json(sc.parallelize([json]), multiLine=True)
df.select(expr("stack(2,'labels1',labels1,'labels2',labels2)")).\
  select(col("col0").alias("tagname"),col("col1.*")).\
  select("tagname",expr("stack(3,'A',A,'B',B,'C',C) as (key,value)")).show()

#+-------+---+-----+
#|tagname|key|value|
#+-------+---+-----+
#|labels1|  A|    1|
#|labels1|  B|    2|
#|labels1|  C|    3|
#|labels2|  A|    1|
#|labels2|  B|    2|
#|labels2|  C|    3|
#+-------+---+-----+

Another way by using unpivot function:

df.withColumn("n",lit(1)).\
  unpivot("n",["labels1", "labels2"],"new","new1").select(col("new").alias("tagname"),col("new1.*")).\
  unpivot("tagname",["A","B","C"],"key","value").\
  show()
samkart
  • 6,007
  • 2
  • 14
  • 29
notNull
  • 30,258
  • 4
  • 35
  • 50
  • Thank you so much for the response, but the issue is I have given only sample data. The keys A,B,C... can be optional and many more keys can be there sometimes. I have edited the question too now. Could you let me know how should I approach in that case. – Amaravathi Satya Jul 10 '23 at 12:08
  • you can create dynamic stack statement based on `df.columns` as explained here:https://stackoverflow.com/a/76314739/7632695 – notNull Jul 10 '23 at 12:13
  • if the answer helped you to solve the issue, take a moment to accept and upvote to close this thread as solved.! https://meta.stackexchange.com/questions/5234/how-does-accepting-an-answer-work – notNull Jul 10 '23 at 13:00
  • Thanks a lot :) it worked!! Also do you think if there is any other way possible. – Amaravathi Satya Jul 10 '23 at 15:00
  • AFAIK i think these are the possible ways, u may try with melt function from pyspark and i believe that is same as unpivot as mentioned in the answer! – notNull Jul 10 '23 at 15:27