0

Example code to create the table in Apache Spark's PySpark shell:

from pyspark.sql.types import StructType,StructField, StringType, IntegerType, MapType

data2 = [({ "firstname2": "John", "lastname2": "Smith" }, "36636","M",3000),
    ({ "firstname2": "John2", "lastname2": "Smith2" }, "40288","M",4000)
    ]


schema = StructType([ \
    StructField("name", MapType(StringType(), StringType()), True), \
    StructField("id", StringType(), True), \
    StructField("gender", StringType(), True), \
    StructField("salary", IntegerType(), True) \
])

df = spark.createDataFrame(data=data2,schema=schema)

df.write.format("delta").option("path", "hdfs://[an_ip]:9000/data-warehouse/test2").mode("overwrite").saveAsTable("test2")

In Power BI Desktop, within "Get Data", we fetch the table, thanks to this Connector:

let
    Source = fn_ReadDeltaTable(Hdfs.Files("[an_ip]:50070/data-warehouse/test2"), [UseFileBuffer=true]),
    test2 = Table.ExpandTableColumn(Source, "name", {"firstname2", "lastname2"}, {"name.firstname2", "name.lastname2"})
in
    test2

But I got this issue:

enter image description here

If trying to use Table.ExpandRecordColumn() instead, here is another issue:

enter image description here

Although the issue can be workarounded by using (nested) StructType/StructField instead of MapType, I wonder if there is a solution in case we insist on using MapType.

JoyfulPanda
  • 867
  • 6
  • 14

0 Answers0