2

Has anyone been able to write to Kafka using this library using PySpark?

I've been able to successfully read using the code from the README documentation:

import logging, traceback
import requests
from pyspark.sql import Column
from pyspark.sql.column import *

jvm_gateway = spark_context._gateway.jvm
abris_avro  = jvm_gateway.za.co.absa.abris.avro
naming_strategy = getattr(getattr(abris_avro.read.confluent.SchemaManager, "SchemaStorageNamingStrategies$"), "MODULE$").TOPIC_NAME()        

schema_registry_config_dict = {"schema.registry.url": schema_registry_url,
                               "schema.registry.topic": topic,
                               "value.schema.id": "latest",
                               "value.schema.naming.strategy": naming_strategy}

conf_map = getattr(getattr(jvm_gateway.scala.collection.immutable.Map, "EmptyMap$"), "MODULE$")
    for k, v in schema_registry_config_dict.items():
        conf_map = getattr(conf_map, "$plus")(jvm_gateway.scala.Tuple2(k, v))
        
    deserialized_df = data_frame.select(Column(abris_avro.functions.from_confluent_avro(data_frame._jdf.col("value"), conf_map))
                      .alias("data")).select("data.*")

However, I am struggling to extend the behaviour by writing to topics via the to_confluent_avro function.

OneCricketeer
  • 179,855
  • 19
  • 132
  • 245
Brandon
  • 375
  • 2
  • 16

0 Answers0