5

I installed Minio (I installed Minio in Kubernetes using helm) with TLS using a self-signed certificate. Previsouly I was able to run my spark job with Minio without TLS. Now it is not possible to conect to Minio (normal !)

Then, I created a truststore file from the tls certificate

keytool -import \
  -alias tls \
  -file tls.crt \
  -keystore truststore.jks \
  -storepass "$minioTruststorePass" \
  -noprompt

I create a Kubernetes secret with the content of the truststore and I use in the spark-defaults.conf the following option to let spark use the trustore:

spark.kubernetes.driver.secrets.minio-truststore-secret

Finally, I did all the following change in my spark-defaults.conf but same problem

spark.hadoop.fs.s3a.endpoint                                      https://smart-agriculture-minio:9000
spark.hadoop.fs.s3.awsAccessKeyId                                 <s3aAccessKey>
spark.hadoop.fs.s3.awsSecretAccessKey                             <s3aSecretKey>
spark.hadoop.fs.s3.impl                                           org.apache.hadoop.fs.s3a.S3AFileSystem
spark.hadoop.fs.s3a.access.key                                    <s3aAccessKey>
spark.hadoop.fs.s3a.secret.key                                    <s3aSecretKey>
spark.hadoop.fs.s3a.path.style.access                             true
spark.hadoop.fs.s3a.impl                                          org.apache.hadoop.fs.s3a.S3AFileSystem
spark.hadoop.fs.s3a.connection.ssl.enabled                        true
spark.driver.extraJavaOptions                                      -Djavax.net.ssl.trustStore=/opt/spark/conf/minio/truststore/truststore.jks -Djavax.net.ssl.trustStorePassword=<minioTruststorePass>
spark.executor.extraJavaOptions                                   -Djavax.net.ssl.trustStore=/opt/spark/conf/minio/truststore/truststore.jks -Djavax.net.ssl.trustStorePassword=<minioTruststorePass>

Have you ever faced this problem and do you have an idea to solve it ?

Thanks

Yassir S
  • 1,032
  • 3
  • 21
  • 44
  • No personal experience, sorry. Underneath it all the apache http client 4.4.x is being used and then to JSSE.I'd have expected those javax optuions to work. Be less ambitious: get the "hadoop fs -ls" command to work first – stevel Apr 23 '20 at 16:39
  • @Yassir you found solution to this? We have similar setup and facing similar issue. – Ayush Goyal Aug 30 '21 at 09:49
  • Here is what I did in my on project https://gitlab.com/ysennoun/smart-agriculture-with-k8s/-/blob/master/deploy/platform/data-processing/spark-jobs/dockerfiles/Dockerfile-es-to-parquet#L22 – Yassir S Sep 01 '21 at 17:45

3 Answers3

3

Quite late, but I got the Hadoop S3/AWS connector to work with a self-signed cert by importing it to the default Java truststore via:

keytool -import -trustcacerts -alias certalias \
-noprompt -file /path/to/cert.crt \
-keystore $JAVA_HOME/jre/lib/security/cacerts \
-storepass changeit

changeit is the default Java cacerts password.

Hannah Ritter
  • 51
  • 1
  • 4
  • Hey @Hermann, when we providing jar from host then using this spark job is able to access data. However, if i tryinh to provide the in s3 as well it is giving me error code 400, Bad request. Any idea how to resolve that? – Ayush Goyal Aug 30 '21 at 10:00
0

spark use hadoop libs, which are using aws-sdk, so you should disable certs check.

com.amazonaws.sdk.disableCertChecking=true

as I have understood , you would like to get answer for k8s + spark operator just add for driver and executor this property to your yaml file

javaOptions: "-Dcom.amazonaws.sdk.disableCertChecking=true"

fyi: https://github.com/GoogleCloudPlatform/spark-on-k8s-operator/blob/master/docs/user-guide.md#specifying-extra-java-options

https://github.com/aws/aws-sdk-java/blob/master/aws-java-sdk-core/src/main/java/com/amazonaws/SDKGlobalConfiguration.java#L29-L34

0
javaOptions: "-Dcom.amazonaws.sdk.disableCertChecking=true"

Worked for me for hive !