I'm trying to add comments to the field (Schema With Data Definitions), below is the implementation I'm trying.
Tried to with StructType.add()
(code in comments) and also with StructType([ StructField("filed",dtype,boolean,metadata )]
got below error. Not sure this implementation works, Can someone help me here, I'm new to spark.
I'm looking for output(Schema With Data Definitions) like
df.printSchema()
root
|-- firstname: string (nullable = true) comments:val1
|-- middlename: string (nullable = true) comments:val2
|-- lastname: string (nullable = true) comments:val3
|-- id: string (nullable = true) comments:val4
|-- gender: string (nullable = true) comments:val5
|-- salary: integer (nullable = true) comments:val6
error:
IllegalArgumentException: Failed to convert the JSON string '{"metadata":"val1","name":"firstname","nullable":true,"type":"string"}' to a field.
Code Which I'm trying to add comments to the field:
import pyspark
from pyspark.sql import SparkSession
from pyspark.sql.types import StructType,StructField, StringType, IntegerType
spark = SparkSession.builder.master("local[1]") \
.appName('SparkByExamples.com') \
.getOrCreate()
data = [("James","","Smith","36636","M",3000),
("Michael","Rose","","40288","M",4000),
("Robert","","Williams","42114","M",4000),
("Maria","Anne","Jones","39192","F",4000),
("Jen","Mary","Brown","","F",-1)
]
schema = StructType([ \
StructField("firstname",StringType(),True,'val1'), \
StructField("middlename",StringType(),True,'val2'), \
StructField("lastname",StringType(),True,'val3'), \
StructField("id", StringType(), True,'val4'), \
StructField("gender", StringType(), True,'val5'), \
StructField("salary", IntegerType(), True,'val6') \
])
# schema= StructType().add("firstname",StringType(),True,'val1').add("middlename",StringType(),True,'val2') \
.add("lastname",StringType(),True,'val3').add("id", StringType(), True,'val4').add("gender", StringType(), True,'val5').add("salary", IntegerType(), True,'val6')
df = spark.createDataFrame(data=data,schema=schema)
df.printSchema()
df.show(truncate=False)