2

Trying to publish data into Kafka topic using confluent schema registry. Following is my schema registry

schemaRegistryClient.register("primitive_type_str_avsc", new Schema.Parser().parse(
  s"""
    |{
    |  "type": "record",
    |  "name": "RecordLevel",
    |  "fields": [
    |    {"name": "id", "type":["string","null"], "default": null}
    |  ]
    |}
  """.stripMargin
  ))

Following case class is used to match the schema

case class myCaseClass (id:Option[String] = None)

Here is my notebook code snippet

import org.apache.spark.sql.functions._
import org.apache.spark.sql.types.StringType
import scala.util.Try

import spark.implicits._

val df1 = Seq(("Welcome")).toDF("a")
        .map(row => myCaseClass(Some(row.getAs("a"))))
val cols = df1.columns

df1.select(struct(cols.map(column):_*).as('struct))
   .select(to_avro('struct, lit("primitive_type_str_avsc"), schemaRegistryAddress).as('value))
   .show()

Facing following exception

org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 77.0 failed 4 times, most recent failure: Lost task 0.3 in stage 77.0 (TID 186, 10.73.122.72, executor 3): org.spark_project.confluent.kafka.schemaregistry.client.rest.exceptions.RestClientException: Schema not found; error code: 40403
    at org.spark_project.confluent.kafka.schemaregistry.client.rest.RestService.sendHttpRequest(RestService.java:191)
    at org.spark_project.confluent.kafka.schemaregistry.client.rest.RestService.httpRequest(RestService.java:218)
    at org.spark_project.confluent.kafka.schemaregistry.client.rest.RestService.lookUpSubjectVersion(RestService.java:284)
    at org.spark_project.confluent.kafka.schemaregistry.client.rest.RestService.lookUpSubjectVersion(RestService.java:272)
    at org.spark_project.confluent.kafka.schemaregistry.client.CachedSchemaRegistryClient.getIdFromRegistry(CachedSchemaRegistryClient.java:78)
    at org.spark_project.confluent.kafka.schemaregistry.client.CachedSchemaRegistryClient.getId(CachedSchemaRegistryClient.java:205)
    at org.apache.spark.sql.avro.SchemaRegistryClientProxy.getId(SchemaRegistryClientProxy.java:52)
    at org.apache.spark.sql.avro.SchemaRegistryAvroEncoder.encoder(SchemaRegistryUtils.scala:97)
    at org.apache.spark.sql.avro.CatalystDataToAvroWithSchemaRegistry.nullSafeEval(CatalystDataToAvroWithSchemaRegistry.scala:57)
    at org.apache.spark.sql.catalyst.expressions.BinaryExpression.eval(Expression.scala:544)

Could you please help in resolving this issue. Thanks in advance.

cristen
  • 21
  • 3
  • Even I'm facing the same issue with default values in to_avro using databricks. Can someone please answer. Thx – Chandra Aug 08 '19 at 13:32

0 Answers0