I am trying to load data using spark into the minio storage -
Below is the spark program -
from pyspark.sql import SparkSession
from pyspark.sql.functions import *
from pyspark.sql.types import *
from datetime import datetime
from pyspark.sql import Window, functions as F
spark = SparkSession.builder.appName("MinioTest").getOrCreate()
sc = spark.sparkContext
spark.conf.set("spark.hadoop.fs.s3a.endpoint", "https://minioendpoint.com/")
spark.conf.set("spark.hadoop.fs.s3a.access.key", "username")
spark.conf..set("spark.hadoop.fs.s3a.secret.key", "password" )
spark.conf..set("spark.hadoop.fs.s3a.path.style.access", True)
spark.conf..set("spark.hadoop.fs.s3a.impl", "org.apache.hadoop.fs.s3a.S3AFileSystem")
df = spark.read.csv('s3a://bucketname/spark-operator-on-k8s/data/input/input.txt',header=True)
df.write.format('csv').options(delimiter='|').mode('overwrite').save('s3a://bucketname/spark-operator-on-k8s/data/output/')
Spark Submit Command -
/usr/middleware/spark-3.2.0-bin-hadoop3.2/bin/spark-submit --jars /usr/middleware/maven/hadoop-aws-3.2.0.jar,/usr/middleware/maven/aws-java-sdk-bundle-1.11.375.jar --driver-class-path /usr/middleware/maven/hadoop-aws-3.2.0.jar,/usr/middleware/maven/aws-java-sdk-bundle-1.11.375.jar --conf spark.executor.extraClassPath="/usr/middleware/maven/hadoop-aws-3.2.0.jar:/usr/middleware/maven/aws-java-sdk-bundle-1.11.375.jar" /usr/middleware/miniocerts/minio.py
Error 1 - Although password was provided in the scripts not sure why this error is thrown -
java.nio.file.AccessDeniedException: s3a://bucketname/spark-operator-on-k8s/data/output: org.apache.hadoop.fs.s3a.auth.NoAuthWithAWSException: No AWS Credentials provided by TemporaryAWSCredentialsProvider SimpleAWSCredentialsProvider EnvironmentVariableCredentialsProvider IAMInstanceCredentialsProvider : com.amazonaws.SdkClientException: Unable to load AWS credentials from environment variables (AWS_ACCESS_KEY_ID (or AWS_ACCESS_KEY) and AWS_SECRET_KEY (or AWS_SECRET_ACCESS_KEY))
Then i
export AWS_ACCESS_KEY_ID=usenrame
export AWS_SECRET_KEY=password
/usr/middleware/spark-3.2.0-bin-hadoop3.2/bin/spark-submit --jars /usr/middleware/maven/hadoop-aws-3.2.0.jar,/usr/middleware/maven/aws-java-sdk-bundle-1.11.375.jar --driver-class-path /usr/middleware/maven/hadoop-aws-3.2.0.jar,/usr/middleware/maven/aws-java-sdk-bundle-1.11.375.jar --conf spark.executor.extraClassPath="/usr/middleware/maven/hadoop-aws-3.2.0.jar:/usr/middleware/maven/aws-java-sdk-bundle-1.11.375.jar" /usr/middleware/miniocerts/minio.py
Now the error is -
java.nio.file.AccessDeniedException: s3a://bucketname/spark-operator-on-k8s/data/input/input.txt: getFileStatus on s3a://bucketname/spark-operator-on-k8s/data/input/input.txt: com.amazonaws.services.s3.model.AmazonS3Exception: Forbidden (Service: Amazon S3; Status Code: 403; Error Code: 403 Forbidden; Request ID: XX9TMS6ANGYZXXKN; S3 Extended Request ID: t4kasdasda=; Proxy: null), S3 Extended Request ID: t4kYUfgfSAnw7ymP:403 Forbidden at org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:249)
mc S3 -policy - .
The access fothe useris is {
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:GetObject",
"s3:ListBucket",
"s3:PutObject",
"s3:DeleteObject"
],
"Resource": [
"arn:aws:s3:::bucketname",
"arn:aws:s3:::bucketname/*"
]
}
]
} ..
Any pointers on what could be the error ?