0

I am trying to connect to s3 provided by minio using spark But it is saying the bucket minikube does not exists. (created bucket already)

val spark = SparkSession.builder().appName("AliceProcessingTwentyDotTwo")
    .config("spark.serializer", "org.apache.spark.serializer.KryoSerializer").master("local[1]")
    .getOrCreate()
  val sc= spark.sparkContext
  sc.hadoopConfiguration.set("fs.s3a.impl", "org.apache.hadoop.fs.s3a.S3AFileSystem")
  sc.hadoopConfiguration.set("fs.s3a.endpoint", "http://localhost:9000")
  sc.hadoopConfiguration.set("fs.s3a.access.key", "minioadmin")
  sc.hadoopConfiguration.set("fs.s3a.secret.key", "minioadmin")
  sc.hadoopConfiguration.set("fs.s3`a`.path.style.access", "true")
  sc.hadoopConfiguration.set("fs.s3a.connection.ssl.enabled","false")  
  sc.textFile("""s3a://minikube/data.json""").collect()

I am using the following guide to connect.

https://github.com/minio/cookbook/blob/master/docs/apache-spark-with-minio.md

These are the dependencies I used in scala.

"org.apache.spark" %% "spark-core" % "2.4.0", "org.apache.spark" %% "spark-sql" % "2.4.0", "com.amazonaws" % "aws-java-sdk" % "1.11.712", "org.apache.hadoop" % "hadoop-aws" % "2.7.3",

Sumit G
  • 436
  • 8
  • 21
  • 1) At which point does the error appear? Be more precise. 2) This bit: ("fs.s3`a`.path.style.access", "true") you have backticks in there, is that in your original code as well? They don't really belong there. – frandroid Mar 09 '20 at 22:16
  • @frandroid that s3a is a copy paste error.. the error appears at sc.textFile() – Sumit G Mar 10 '20 at 15:30
  • 1. Cut "fs.s3a.impl" -that's just some superstition passed down by others. 2. use the same version of the AWS SDK the five year old version of hadoop you are using https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-aws/2.7.0 – stevel Mar 11 '20 at 10:49

1 Answers1

2

Try spark 2.4.3 without hadoop and use Hadoop 2.8.2 or 3.1.2. After trying steps in below link I am able to connect minio using cli

https://www.jitsejan.com/setting-up-spark-with-minio-as-object-storage.html