1

I am trying to access some tables in RDS using Pyspark in EMR.

I tried installing JDBC drivers in /usr/share/java but looks like the spark is not picking up the drivers

from pyspark.sql import SparkSession
jdbc_url = "jdbc:mysql://{0}:{1}/{2}".format(hostname, jdbcPort, dbname)

hostname = "rds_host"
jdbcPort = 3306
dbname = 'demo'
username = 'user'
password = 'pass'
table = "demo_table"

connectionProperties = {
  "user" : "user",
  "password" : "pass"
}

my_df = spark.read.jdbc(url=jdbc_url, table='SRC_CONNECTION', properties= connectionProperties)
my_df.show()




ERROR:
py4j.protocol.Py4JJavaError: An error occurred while calling o66.jdbc.
: java.sql.SQLException: No suitable driver`
John Rotenstein
  • 241,921
  • 22
  • 380
  • 470
Santosh Santu
  • 91
  • 1
  • 11

1 Answers1

1

Add mysql-connector-java-*.jar while initializing the pyspark shell use `--jars .

  • For spark-submit using --driver-classpath <jar_path> argument.

  • In connectionProperties add driver : "com.mysql.jdbc.Driver"

notNull
  • 30,258
  • 4
  • 35
  • 50