We are trying to connect to a remote Oracle database running as AmazonRDS using SSO wallet configured at our end and Apache Spark. We are able to load the data using the spark-shell
utility as mentioned below
Start the spark shell with jdbc and oraclepki jar added to the classpath
spark-shell --driver-class-path /path/to/ojdbc8.jar:/path/to/oraclepki.jar
This is the JDBC url used:
val JDBCURL="jdbc:oracle:thin:@(DESCRIPTION=(ADDRESS=(PROTOCOL=TCPS)(HOST=www.example.aws.server.com)(PORT=1527))(CONNECT_DATA=(SID=XXX))(SECURITY = (SSL_SERVER_CERT_DN =\"C=US,ST=xxx,L=ZZZ,O=Amazon.com,OU=RDS,CN=www.xxx.aws.zzz.com\")))"
And below is the Spark jdbc call to load the data
spark.read.format("jdbc").option("url",JDBCURL)
.option("user","USER")
.option("oracle.net.tns_admin","/path/to/tnsnames.ora")
.option("oracle.net.wallet_location","(SOURCE=(METHOD=file)(METHOD_DATA=(DIRECTORY=/path/to/ssl_wallet/)))")
.option("password", "password")
.option("javax.net.ssl.trustStore","/path/to/cwallet.sso")
.option("javax.net.ssl.trustStoreType","SSO")
.option("dbtable",QUERY)
.option("driver", "oracle.jdbc.driver.OracleDriver").load
But when we are trying to run it using the spark-submit
command we are getting the below error:
Exception in thread "main" java.sql.SQLRecoverableException: IO Error: The Network Adapter could not establish the connection
at oracle.jdbc.driver.T4CConnection.logon(T4CConnection.java:774)
at oracle.jdbc.driver.PhysicalConnection.connect(PhysicalConnection.java:688)
...
...
...
Caused by: oracle.net.ns.NetException: The Network Adapter could not establish the connection
at oracle.net.nt.ConnStrategy.execute(ConnStrategy.java:523)
at oracle.net.resolver.AddrResolution.resolveAndExecute(AddrResolution.java:521)
at oracle.net.ns.NSProtocol.establishConnection(NSProtocol.java:660)
at oracle.net.ns.NSProtocol.connect(NSProtocol.java:286)
at oracle.jdbc.driver.T4CConnection.connect(T4CConnection.java:1438)
at oracle.jdbc.driver.T4CConnection.logon(T4CConnection.java:518)
... 28 more
Caused by: oracle.net.ns.NetException: Unable to initialize ssl context.
at oracle.net.nt.CustomSSLSocketFactory.getSSLSocketEngine(CustomSSLSocketFactory.java:597)
at oracle.net.nt.TcpsNTAdapter.connect(TcpsNTAdapter.java:143)
at oracle.net.nt.ConnOption.connect(ConnOption.java:161)
at oracle.net.nt.ConnStrategy.execute(ConnStrategy.java:470)
... 33 more
Caused by: oracle.net.ns.NetException: Unable to initialize the key store.
at oracle.net.nt.CustomSSLSocketFactory.getKeyManagerArray(CustomSSLSocketFactory.java:642)
at oracle.net.nt.CustomSSLSocketFactory.getSSLSocketEngine(CustomSSLSocketFactory.java:580)
... 36 more
Caused by: java.security.KeyStoreException: SSO not found
at java.security.KeyStore.getInstance(KeyStore.java:851)
at oracle.net.nt.CustomSSLSocketFactory.getKeyManagerArray(CustomSSLSocketFactory.java:628)
... 37 more
Caused by: java.security.NoSuchAlgorithmException: SSO KeyStore not available
at sun.security.jca.GetInstance.getInstance(GetInstance.java:159)
at java.security.Security.getImpl(Security.java:695)
at java.security.KeyStore.getInstance(KeyStore.java:848)
I am very new to spark and might be doing something wrong here. This is how I am trying to configure the Config
val conf = new SparkConf().setAppName(JOB_NAME)
conf.set("javax.net.ssl.trustStore", "/path/to/cwallet.sso");
conf.set("javax.net.ssl.trustStoreType", "SSO")
conf.set("oracle.net.tns_admin", "/path/to/tnsnames.ora")
conf.set("oracle.net.wallet_location", "(SOURCE=(METHOD=file)(METHOD_DATA=(DIRECTORY=/path/to/ssl_wallet/dir/)))")
conf.set("user", "user")
conf.set("password", "pass")
Below is the spark-submit
command used
spark-submit --class fully.qualified.path.to.main \
--jars /path/to/ojdbc8.jar,/path/to/oraclepki.jar,/path/to/osdt_cert.jar,/path/to/osdt_core.jar \
--deploy-mode client --files /path/to/hive-site.xml --master yarn \
--driver-memory 12G \
--conf "spark.executor.extraJavaOptions=-Djavax.net.ssl.trustStore=/path/to/cwallet.sso -Djavax.net.ssl.trustStoreType=SSO" \
--executor-cores 4 --executor-memory 12G \
--num-executors 20 /path/to/application.jar /path/to/application_custom_config.conf
Also tried to add
--conf 'spark.executor.extraJavaOptions=-Djavax.net.ssl.trustStore=/path/to/cwallet.sso -Djavax.net.ssl.trustStoreType=SSO'
and
--files /path/to/cwallet.sso,/path/to/tnsnames.ora
to the spark-submit
command but without any luck.
What exactly I am doing wrong here? Also tried the solution mentioned in this post but getting the same error.Do I need to make sure trustStore should be accessible on each executor node
? If that is the case then why the spark-shell
command is working fine ? Does this mean spark-cli does not include any worker nodes to execute the command ?
Please advice
UPDATE:
It looks like you're using the JDBC driver from 12.1.0.2. Please upgrade to 18.3 which you can download from oracle.com/technetwork/database/application-development/jdbc/… Some changes have been made to make the use of wallets easier. -- @Jean de Lavarene
After following the suggested change by @Jean de Lavarene got rid of the initial error but below is what I am getting now
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 3, example.server.net, executor 2): java.sql.SQLException: PKI classes not found. To use 'connect /' functionality, oraclepki.jar must be in the classpath: java.lang.NoClassDefFoundError: oracle/security/pki/OracleWallet
at oracle.jdbc.driver.PhysicalConnection.getSecretStoreCredentials(PhysicalConnection.java:3058)
at oracle.jdbc.driver.PhysicalConnection.parseUrl(PhysicalConnection.java:2823)
When I run this in spark local mode : --master local[*]
it works fine but failing with yarn
mode.
I am already using the --jars
command with comma separated list of jars. What I found is :
1) --jars
expect the path to be a local one, and then it copies them to the HDFS path
2) using file:///
at the beginning is not working
3) If I do not specify the --jars
parameter the program is asking for missing JDBC driver class. Once I specify the ojdbc8.jar
using --jars then the errors go away and start giving the oraclepki.jar
not found error. I have NO CLUE why this is happening.
4) Also tried using :
as the separator while specifying multiple jars but without any luck
UPDATE 2
I was able to resolve the oraclepki.jar
not found exception by using the
--driver-class-path /path/to/oraclepki.jar:/path/to/osdt_cert.jar:/path/to/others.jar
but once we are running into the --master yarn
mode then the following exception is being shown
Caused by: oracle.net.ns.NetException: Unable to initialize the key store.
at oracle.net.nt.CustomSSLSocketFactory.getKeyManagerArray(CustomSSLSocketFactory.java:617)
at oracle.net.nt.CustomSSLSocketFactory.createSSLContext(CustomSSLSocketFactory.java:322)
... 32 more
Caused by: java.io.FileNotFoundException: /path/to/cwallet.sso (No such file or directory)
As per my understanding it looks like when it launching the job from the worker node the cwallet.sso
file path is not available on those nodes. We tried to specify a HDFS path for the wallet but the utility expects a local path to be provided when creating the wallet.
So do we need to manually copy the wallet file to all worker nodes ? Or is there any better alternatives to achieve this?
Please advice