This has many examples
Read compressed file
rdd = sc.textFile("s3://bucket/lahs/blahblah.*.gz")
Without your code its hard, here is an outline on reading and writing it back
From this answer for the rest...
val spark = SparkSession.builder()
.appName("myKfconsumer")
.master("local[*]")
.getOrCreate()
//... create your schema
// you path
val filePath = "file:///tmp/spark/Blah/blah"
// create it as your batch data
// someBatchData
// now read it, you need your schema and write it back in the process section below
import spark.implicits._
val kafkaStream = spark.readStream
.format("kafka")
.option("kafka.xyz.servers", "localhost:0001")
.option("subscribe", "blah")
.option("startingOffsets", "latest")
.option("failOnDataLoss", "true") // stop and debug it
.load()
.as[String]
kafkaStream.writeStream.foreachBatch((someBatchData:Dataset[String], batchId:Long) => {
val records = someBatchData.collect()
// go through all the records
records.foreach((path: String) => {
val yourData = spark.read.schema(.. youfileSchema).json(..youPath)
// write it back as you wanted..
//
})
}).start()
spark.streams.awaitAnyTermination()