11

Is there a way of connecting a Spark Structured Streaming Job to a Kafka cluster which is secured by SASL/PLAIN authentication?

I was thinking about something similar to:

val df2 = spark.read.format("kafka")
    .option("kafka.bootstrap.servers", "localhost:9092")
    .option("kafka.sasl.mechanism", "PLAIN")
    .option("kafka.security.protocol", "SASL_PLAINTEXT")
    .option("kafka.sasl.jaas.config", "org.apache.kafka.common.security.plain.PlainLoginModule required username=...")
    .option("subscribe", "topic1")
    .load();

It seems like while Spark Structured Streaming recognizes the kafka.bootstrap.servers option, it does not recognize the other SASL-related options. Is there a different way?

user152468
  • 3,202
  • 6
  • 27
  • 57
  • 2
    I just noticed that it works exactly this way once you provide all of the sasl related configuration options. – user152468 Apr 28 '20 at 14:27

1 Answers1

11

Here is a full example in PySpark.

For test/dev you can inline the JAAS config in your options.

options = {
    "kafka.sasl.jaas.config": 'org.apache.kafka.common.security.plain.PlainLoginModule required username="USERNAME" password="PASSWORD";',
    "kafka.sasl.mechanism": "PLAIN",
    "kafka.security.protocol" : "SASL_SSL",
    "kafka.bootstrap.servers": bootstrap_servers,
    "group.id": group_id,
    "subscribe": topic,
}
df = spark.readStream.format("kafka").options(**options).load()

If you use this mode in production you're going to want your JAAS config in a file. To do that copy the exact contents into a file called jaas.conf and remove the jaas key:

options = {
    "kafka.sasl.mechanism": "PLAIN",
    "kafka.security.protocol" : "SASL_SSL",
    "kafka.bootstrap.servers": bootstrap_servers,
    "group.id": group_id,
    "subscribe": topic,
}
df = spark.readStream.format("kafka").options(**options).load()

Then provide the file path to spark-submit. For example:

spark-submit \
  --driver-java-options -Djava.security.auth.login.config=/path/to/jaas.conf \
  --packages org.apache.spark:spark-sql-kafka-0-10_2.11:2.4.5 yourapp.py

You will need to choose the right path and versions for your application.

Carter Shanklin
  • 2,967
  • 21
  • 18