Example code to create the table in Apache Spark's PySpark shell:
from pyspark.sql.types import StructType,StructField, StringType, IntegerType, MapType
data2 = [({ "firstname2": "John", "lastname2": "Smith" }, "36636","M",3000),
({ "firstname2": "John2", "lastname2": "Smith2" }, "40288","M",4000)
]
schema = StructType([ \
StructField("name", MapType(StringType(), StringType()), True), \
StructField("id", StringType(), True), \
StructField("gender", StringType(), True), \
StructField("salary", IntegerType(), True) \
])
df = spark.createDataFrame(data=data2,schema=schema)
df.write.format("delta").option("path", "hdfs://[an_ip]:9000/data-warehouse/test2").mode("overwrite").saveAsTable("test2")
In Power BI Desktop, within "Get Data", we fetch the table, thanks to this Connector:
let
Source = fn_ReadDeltaTable(Hdfs.Files("[an_ip]:50070/data-warehouse/test2"), [UseFileBuffer=true]),
test2 = Table.ExpandTableColumn(Source, "name", {"firstname2", "lastname2"}, {"name.firstname2", "name.lastname2"})
in
test2
But I got this issue:
If trying to use Table.ExpandRecordColumn()
instead, here is another issue:
Although the issue can be workarounded by using (nested) StructType/StructField
instead of MapType
, I wonder if there is a solution in case we insist on using MapType
.