1

I'm trying to stream data from kafka (json messages) and write it to aws s3 using apache spark's (2.4.0) structured streaming api.

But I get an exception from aws lib, without much detail.

I've tried writing on local fs and hdfs, it works properly.

For S3, I'm able to list files via hdfs cli using hdfs dfs -ls s3://<bucket-name>/test/

A snippet of what I'm trying to do

val spark = SparkSession.builder()
     .master("local[*]")
     .config("spark.hadoop.fs.s3a.impl", "org.apache.hadoop.fs.s3a.S3AFileSystem")
     .config("spark.hadoop.fs.s3a.access.key", "xxxx")
     .config("spark.hadoop.fs.s3a.secret.key", "xxxx")
     .config("spark.hadoop.fs.s3a.endpoint", "s3.ap-south-1.amazonaws.com")
     .getOrCreate()

val df = spark.readStream.format("kafka")
               .option("kafka.bootstrap.servers","localhost:9092")
               .option("subscribe", "test")
               .load()
// SOME ETL THEN

   df.writeStream
       .outputMode("append")
       .option("checkpointLocation", "s3a://<bucket-name>/test/")
       .format("json")
       .option("path", "s3a://<bucket-name>/test/")
       .start()

The Exception I get

Exception in thread "main" com.amazonaws.services.s3.model.AmazonS3Exception: Status Code: 400, AWS Service: Amazon S3, AWS Error Code: null, AWS Error Message: Bad Request

  • @PabloLópezGallego I created a new bucket for Ireland region, it seems to be able to access s3, cause there were no errors only warnings also I can see the metadata created in the bucket. Although not able to write any of the stream data ( which I think must be a different problem). – Ashish Taldeokar May 21 '19 at 07:24
  • Ok, sorry, I couldn't help you – Pablo López Gallego May 21 '19 at 07:38

0 Answers0