2

I'm trying to use the function from_avro in a dataframe.

This dataframe has its origin from a streamRead from kafka and at some point I create a column with the schemaId (related to schema registry) and the message.

I then have an UDF that grabs the schema Id and goes to my SchemaRegistry API to fetch the schema, so at the end I have a column with it.

After this, I'm trying to create a new column with the decoded message by calling from_avro("message", "schema", options) but my "schema" is a column and the from_avro expects a string. I tried everything to turn the column to string but I get other errors like "Column is not iterable".

I also tried to move the from_avro into a UDF but then I get issues related to the fact that the from_avro needs to be executed in driver context (even though I'm using a single node cluster).

FEST
  • 813
  • 2
  • 14
  • 37
  • The open-source version of the `from_avro` function doesn't work with Confluent serialized data. Refer https://github.com/AbsaOSS/ABRiS/ . Otherwise, make sure you're using the one that is included from Databricks – OneCricketeer Aug 23 '21 at 16:14
  • Hi @OneCricketeer, I was using the one provided by Databricks by I'm trying to change it because I need to access the Schema Registry API via SSL which is not an option for the databricks one. I was able to do almost everything except figure out how they do the decoding with a dataframe. Unfortunately, their code is not open source so I can not check how they are doing it. – FEST Aug 24 '21 at 14:27
  • I dont use Databricks, but I would be very surprised if they didnt support SSL, or passing any configs to the Registry client/Kafka Deserializer. Have you tried contacting their support? – OneCricketeer Aug 24 '21 at 14:59
  • Hi @FEST, is this issue resolved? I'm having similar problems connecting to schema registry with SSL. – ableHercules Jan 31 '22 at 22:31

0 Answers0