TL;DR It's not possible directly (esp. with the old Spark 1.6), but not impossible either.
Kafka sees bytes and that's what Spark Streaming expects. You'd have to somehow pass some extra information on fixed fields to get the schema (possibly as a JSON-encoded string) and decode the other field. It is not available out of the box, but is certainly doable.
As a suggestion, I'd send a message where value
field would always be two-field data structure with the schema (of a value field) and the value itself (in JSON format).
You could then use one of from_json functions:
from_json(e: Column, schema: StructType): Column Parses a column containing a JSON string into a StructType with the specified schema.
Given from_json
was added in Spark 2.1.0, you'd have to register your own custom user-defined function (UDF) that'd deserialize the string value into a corresponding structure (just see how from_json
does it and copy it).
Note that DataType
object comes with fromJson method that can "map" a JSON-encoded string into a DataType
that would describe your schema.
fromJson(json: String): DataType