I'm implementing spark(1.5.2) sql RelationProvider for custom data source (properties files).
Can some one please explain how automatic inference algorithm should be implemented?
I'm implementing spark(1.5.2) sql RelationProvider for custom data source (properties files).
Can some one please explain how automatic inference algorithm should be implemented?
In general, you need to create a StructType
that represents your schema. A StructType
contains an Array[StructField]
, where each element of the array corresponds to a column in your schema. A StructField
can be any supported DataType
-- including another StructType
for nested schemas.
Creating a schema can be as simple as:
val schema = StructType(Array(
StructField("col1", StringType),
StructField("col2", LongType)
))
If you want to generate a schema from a complex dataset -- one that includes nested StructTypes
-- then you most likely need to create a recursive function. A good example of what such a function looks like can be found in the spark-avro
integration library. The function toSqlType takes an Avro
schema and converts it into a Spark StructType
.