Questions tagged [spark-jdbc]
78 questions
2
votes
2 answers
Check table exists Spark jdbc
I am reading some data into a data frame from Microsoft SQL server using Spark JDBC. And when the table does not exist (for example, it was dropped accidentally) I get an exception: com.microsoft.sqlserver.jdbc.SQLServerException: Invalid object…

Cassie
- 2,941
- 8
- 44
- 92
2
votes
1 answer
Spark JDBC read ends up in one partition only
I have the below code snippet for reading data from a Postgresql table from where I am pulling all available data i.e. select * from table_name :
jdbcDF = spark.read \
.format("jdbc") \
.option("url", self.var_dict['jdbc_url']) \
…

Abhi
- 163
- 2
- 14
2
votes
1 answer
How to specify Trust store and trust store type for Spark JDBC connection
I am new to Spark and we are currently using the spark-java to create orc files from Oracle database. I was able to configure the connection with
sqlContext.read().jdbc(url,table,props)
However, I couldn't find any way in the properties to specify…

Sai Kumar
- 112
- 2
- 11
1
vote
0 answers
Creating partitioned table in postgres via Spark JDBC write
I want to write a dataframe to a postgres table via spark jdbc connector. The table i am writing to in postgres needs to be partitioned by a certain column. This is currently how i am writing it. I am running spark 3.2.3 and postgres 11
val username…

sanchit08
- 119
- 1
- 7
1
vote
0 answers
How to write a Spark Dataframe into multiple JDBC table based on a column
I'm working with a batch Spark pipeline written in Scala (v2.4). I would like to save a dataframe into a Postgresql database. However, instead of saving all rows into a single table in the database, I want to save them to multiple tables based on…

IllSc
- 1,419
- 3
- 17
- 24
1
vote
0 answers
Override JdbcUtils`saveTable` method
How to extend the spark-jdbc sink and override saveTable method, i wanted to use one transaction for the entire dataframe batch instead of separate transactions per…

StarScream
- 223
- 2
- 12
1
vote
1 answer
Schema capitalization(uppercase) problem when reading with Spark
Using Scala here:
Val df = spark.read.format("jdbc").
option("url", "").
option("dbtable", "UPPERCASE_SCHEMA.table_name").
option("user", "postgres").
option("password", "").
option("numPartitions", 50).
…

Wonseok Choi
- 99
- 8
1
vote
0 answers
How to get Spark metric for Spark JDBC writer
Versions: Scala - 2.11, Spark: 2.4.4
To implement this, I have created my own implementation of SparkListener and added this during creating Spark session.
class SparkMetricListener extends SparkListener {
...
override def onTaskEnd .. {
..
//use…

VimalK
- 65
- 1
- 8
1
vote
1 answer
Spark SQL : INSERT Statement with JDBC does not support default value
I am trying to read/write data from other databases using JDBC.
just following the doc https://spark.apache.org/docs/latest/sql-data-sources-jdbc.html
But I found Spark SQL does not work well with Default value or AUTO_INCREMENT
CREATE TEMPORARY…

shiyuhang
- 31
- 5
1
vote
1 answer
Spark JDBC "batch size" effect on insert
I wanted to know what effect the batchsize option has on an insert operation using spark jdbc. Does this mean a bulk insert using one insert command similar to a bulk insert or a batch of insert commands that gets committed at the end?
Could someone…

justlikethat
- 329
- 2
- 12
1
vote
0 answers
Issue when reading Teradata table via Apache Spark
I'm reading a Teradata table using Spark. Here is my code:
spark.read.format("jdbc")
.option("url", "jdbc:teradata://127.0.0.1/database=test, TMODE=TERA")
.option("username", "test")
.option("password", "test")
…

Finkelson
- 2,921
- 4
- 31
- 49
1
vote
1 answer
Spark JDBC UpperBound
jdbc(String url,
String table,
String columnName,
long lowerBound,
long upperBound,
int numPartitions,
…

Akhil
- 63
- 5
1
vote
1 answer
How to register a JDBC Spark dialect in Python?
I am trying to read from a databricks table. I have used the url from a cluster in the databricks. I am getting this error:
java.sql.SQLDataException: [Simba][JDBC](10140) Error converting value to int.
After these statements:
jdbcConnUrl=…

Samruddhi Padture
- 21
- 3
1
vote
2 answers
Why does PostgreSQL say FATAL: sorry, too many clients already when I am nowhere close to the maximum connections?
I am working with an installation of PostgreSQL 11.2 that periodically complains in its system logs
FATAL: sorry, too many clients already
despite being no-where close to its configured limit of connections. This query:
SELECT…

Eddie
- 53,828
- 22
- 125
- 145
1
vote
0 answers
How to connect to SQL Server using Pyspark which is running on GCP without using Secure Sockets Layer?
I am trying to connect to a SQL Server database using PySpark as below:
from pyspark.sql import SparkSession
import traceback
def connect_and_read(spark: SparkSession):
url =…

Metadata
- 2,127
- 9
- 56
- 127