I am trying to access some tables in RDS using Pyspark in EMR.
I tried installing JDBC drivers in /usr/share/java but looks like the spark is not picking up the drivers
from pyspark.sql import SparkSession
jdbc_url = "jdbc:mysql://{0}:{1}/{2}".format(hostname, jdbcPort, dbname)
hostname = "rds_host"
jdbcPort = 3306
dbname = 'demo'
username = 'user'
password = 'pass'
table = "demo_table"
connectionProperties = {
"user" : "user",
"password" : "pass"
}
my_df = spark.read.jdbc(url=jdbc_url, table='SRC_CONNECTION', properties= connectionProperties)
my_df.show()
ERROR:
py4j.protocol.Py4JJavaError: An error occurred while calling o66.jdbc.
: java.sql.SQLException: No suitable driver`