I am working on a database where the data is stored in csv format. The DB looks like the following:
id | containertype | size |
---|---|---|
1 | CASE | {height=2.01, length=1.07, width=1.22} |
2 | PALLET | {height=1.80, length=1.07, width=1.23} |
I want to parse the data inside size
column and create a pyspark df like:
id | containertype | height | length | width |
---|---|---|---|---|
1 | CASE | 2.01 | 1.07 | 1.22 |
2 | PALLET | 1.80 | 1.07 | 1.23 |
I tried parsing the string to StructType and MapType but none of the approaches are working. Is there any way to do it except the messy string manipulation?
Reproducible data-frame code:
df = spark.createDataFrame(
[
("1", "CASE", "{height=2.01, length=1.07, width=1.22}"),
("2", "PALLET", "{height=2.01, length=1.07, width=1.22}"),
],
["id", "containertype", "size"]
)
df.printSchema()