Questions tagged [spark-jdbc]

78 questions
2
votes
2 answers

Check table exists Spark jdbc

I am reading some data into a data frame from Microsoft SQL server using Spark JDBC. And when the table does not exist (for example, it was dropped accidentally) I get an exception: com.microsoft.sqlserver.jdbc.SQLServerException: Invalid object…
Cassie
  • 2,941
  • 8
  • 44
  • 92
2
votes
1 answer

Spark JDBC read ends up in one partition only

I have the below code snippet for reading data from a Postgresql table from where I am pulling all available data i.e. select * from table_name : jdbcDF = spark.read \ .format("jdbc") \ .option("url", self.var_dict['jdbc_url']) \ …
Abhi
  • 163
  • 2
  • 14
2
votes
1 answer

How to specify Trust store and trust store type for Spark JDBC connection

I am new to Spark and we are currently using the spark-java to create orc files from Oracle database. I was able to configure the connection with sqlContext.read().jdbc(url,table,props) However, I couldn't find any way in the properties to specify…
Sai Kumar
  • 112
  • 2
  • 11
1
vote
0 answers

Creating partitioned table in postgres via Spark JDBC write

I want to write a dataframe to a postgres table via spark jdbc connector. The table i am writing to in postgres needs to be partitioned by a certain column. This is currently how i am writing it. I am running spark 3.2.3 and postgres 11 val username…
sanchit08
  • 119
  • 1
  • 7
1
vote
0 answers

How to write a Spark Dataframe into multiple JDBC table based on a column

I'm working with a batch Spark pipeline written in Scala (v2.4). I would like to save a dataframe into a Postgresql database. However, instead of saving all rows into a single table in the database, I want to save them to multiple tables based on…
IllSc
  • 1,419
  • 3
  • 17
  • 24
1
vote
0 answers

Override JdbcUtils`saveTable` method

How to extend the spark-jdbc sink and override saveTable method, i wanted to use one transaction for the entire dataframe batch instead of separate transactions per…
StarScream
  • 223
  • 2
  • 12
1
vote
1 answer

Schema capitalization(uppercase) problem when reading with Spark

Using Scala here: Val df = spark.read.format("jdbc"). option("url", ""). option("dbtable", "UPPERCASE_SCHEMA.table_name"). option("user", "postgres"). option("password", ""). option("numPartitions", 50). …
1
vote
0 answers

How to get Spark metric for Spark JDBC writer

Versions: Scala - 2.11, Spark: 2.4.4 To implement this, I have created my own implementation of SparkListener and added this during creating Spark session. class SparkMetricListener extends SparkListener { ... override def onTaskEnd .. { .. //use…
VimalK
  • 65
  • 1
  • 8
1
vote
1 answer

Spark SQL : INSERT Statement with JDBC does not support default value

I am trying to read/write data from other databases using JDBC. just following the doc https://spark.apache.org/docs/latest/sql-data-sources-jdbc.html But I found Spark SQL does not work well with Default value or AUTO_INCREMENT CREATE TEMPORARY…
shiyuhang
  • 31
  • 5
1
vote
1 answer

Spark JDBC "batch size" effect on insert

I wanted to know what effect the batchsize option has on an insert operation using spark jdbc. Does this mean a bulk insert using one insert command similar to a bulk insert or a batch of insert commands that gets committed at the end? Could someone…
justlikethat
  • 329
  • 2
  • 12
1
vote
0 answers

Issue when reading Teradata table via Apache Spark

I'm reading a Teradata table using Spark. Here is my code: spark.read.format("jdbc") .option("url", "jdbc:teradata://127.0.0.1/database=test, TMODE=TERA") .option("username", "test") .option("password", "test") …
Finkelson
  • 2,921
  • 4
  • 31
  • 49
1
vote
1 answer

Spark JDBC UpperBound

jdbc(String url, String table, String columnName, long lowerBound, long upperBound, int numPartitions, …
Akhil
  • 63
  • 5
1
vote
1 answer

How to register a JDBC Spark dialect in Python?

I am trying to read from a databricks table. I have used the url from a cluster in the databricks. I am getting this error: java.sql.SQLDataException: [Simba][JDBC](10140) Error converting value to int. After these statements: jdbcConnUrl=…
1
vote
2 answers

Why does PostgreSQL say FATAL: sorry, too many clients already when I am nowhere close to the maximum connections?

I am working with an installation of PostgreSQL 11.2 that periodically complains in its system logs FATAL: sorry, too many clients already despite being no-where close to its configured limit of connections. This query: SELECT…
Eddie
  • 53,828
  • 22
  • 125
  • 145
1
vote
0 answers

How to connect to SQL Server using Pyspark which is running on GCP without using Secure Sockets Layer?

I am trying to connect to a SQL Server database using PySpark as below: from pyspark.sql import SparkSession import traceback def connect_and_read(spark: SparkSession): url =…
Metadata
  • 2,127
  • 9
  • 56
  • 127