0

I'm trying to connect to a database from a pyspark Synapse analytics notebook using a jdbc driver. I'm having a Py4JJavaError when running my code but I can't see the full output of the error. The remaining rows of the error are hidden behind "... 27 more".

I tried to check the logs of the Spark pool but it's the same output with the remaining 27 rows hidden.

I tried also to use different options in the notebook like for example the magic command %tb or setting the SparkContext setLogLevel to ("DEBUG") and ("INFO") as well but I'm still having the same problem.

Thank you for your help.

datatalian
  • 83
  • 1
  • 5

1 Answers1

0

I tried to replicate the issue by connecting database from synapse notebook in my environment in my synapse notebook with below code:

Server = "<serverName>"
Port = 1433
Database = "<databaseName>"
Username = "<userName>"
Password = "<Password>"
jdbcDriver = "com.microsoft.sqlserver.jdbc.SQLServerDriver"
table = "customers"
jdbcUrl = f"jdbc:sqlserver://{Server}:{Port};databaseName={Database}"
df1 = spark.read.format("sqlserver").option("driver", jdbcDriver).option("url", jdbcUrl).option("dbtable", table).option("user", Username).option("password", Password).load()
df1.show()

I got the error in below format:

enter image description here

I followed so many ways to expand the error with hidden lines, but it was not possible to expand it. So, I run the above code in another platform to know about error like Databricks. I find the error what I am getting in Databricks notebook:

enter image description here

according to the error I changed the format to 'jdbc':

df1 = spark.read.format("sqlserver").option("driver", jdbcDriver).option("url", jdbcUrl).option("dbtable", table).option("user", Username).option("password", Password).load()

I run with corrections it connected to database successfully.

enter image description here

Bhavani
  • 1,725
  • 1
  • 3
  • 6
  • Thank you Bhavani. I'm using the format 'jdbc'. In my case we think it could be a firewall issue so that's where I'm trying to solve the problem but from a Synapse point of view it would be great to have all the rows in the output, at least in the logs on the clusters. Thanks for sharing the insight that in Databricks is working. – datatalian May 20 '23 at 12:39