Questions tagged [spark-jdbc]
78 questions
1
vote
1 answer
Spark does not push the filter down to PostgreSQL data source when reading data in parallel providing values for lower bound and upper bound
I am trying to read the data from the PostgreSQL table in parallel. I am using the timestamp column as the partition column and providing the values for the lower bound, upper bound and numPartitions. It is creating multiple queries to read data in…

Nikunj Kakadiya
- 2,689
- 2
- 20
- 35
1
vote
1 answer
I want Spark to ignore bad records while saving into database
I am saving the Rows in database using spark JDBC. The saving of the data works fine.
Issue: Spark aborts saving if it encounters any bad records (e.g. a column with null values when table is expecting the non-null value)
What I want: I want Spark…

nav
- 148
- 2
- 5
- 13
1
vote
2 answers
How many connections to the database from Spark while writing a dataframe?
I'm confused how many connections would be made to the database by Spark in the below scenario:
Let's say I have a Spark program which is running only on one worker node with one executor, and the number of partitions in a dataframe is 10. I want to…

amit kumar
- 55
- 1
- 7
1
vote
1 answer
Spark jdbc read performance tuning with no primary key column
I am running a spark analytics application and reading MSSQL Server table (whole table) directly using spark jdbc. Thes table have more than 30M records but don't have any primary key column or integer column. Since the table don't have such column…

Sandeep Singh
- 7,790
- 4
- 43
- 68
1
vote
0 answers
Is it necessary to add SaveMode Delete, Update and Upsert for Spark jdbc source?
Do you think it is necessary to add SaveMode for Delete, Update, and Upsert? Such as:
SaveMode.Delete
SaveMode.Update
SaveMode.Upsert
referring to the code: JdbcRelationProvider.scala
I've analyzed its code for SaveTable: JdbcUtils.scala, and…

timothyzhang
- 730
- 9
- 12
1
vote
0 answers
Instrumenting Spark JDBC with javaagent
I am attempting to instrument JDBC calls using the Kamon JDBC Kanela agent in my Spark app.
I am able to successfully instrument JDBC calls in a non-spark test app by passing in -javaagent:kanela-agent-1.0.1.jar on the command line when I run the…

JoeMjr2
- 3,804
- 4
- 34
- 62
1
vote
1 answer
Calculate lower and upper bounds for partition Spark JDBC
I read the data from MS SQL server using Spark-jdbc with Scala and I would like to partition this data by the specified column. I do not want to set lower and upper bounds for the partition column manually. Can I read some kind of maximum and…

Cassie
- 2,941
- 8
- 44
- 92
1
vote
0 answers
Spark JDBC and transaction pooling in PGBouncer
I am using Spark JDBC DataFramReader to query Postgres DB, the query is executed threw PGBouncer working in Transaction Pooling.
From the second executed query I receive the following error:
org.postgresql.util.PSQLException: ERROR: prepared…

Alex Stanovsky
- 1,286
- 1
- 13
- 28
1
vote
1 answer
spark jdbc read tuning where table without primary key
I am reading 30M records from oracle table with no primary key columns.
spark jdbc reading hangs and not fetching any data. where i can get the result from Oracle SQLDeveloper within few seconds for same query.
oracleDf =…

Ramakrishna
- 1,170
- 2
- 10
- 17
1
vote
1 answer
pySpark jdbc write error: An error occurred while calling o43.jdbc. : scala.MatchError: null
I am trying to write simple spark dataframe to db2 database using pySpark. Dataframe has only one column with double as a data type.
This is the dataframe with only one row and one column:
This is the dataframe schema:
When I try to write this…

vaibhav sapkal
- 11
- 2
1
vote
0 answers
Processing a huge database table with Spark
I have a huge database table which contains millions of records.
Each record can be processed in isolation and it has to be converted in, let's say, a string.
So I started looking around and I was wondering if Spark could help me in this scenario.…

Andrea
- 2,714
- 3
- 27
- 38
1
vote
0 answers
SparkSQL JDBC writer fails with "Cannot acquire locks error"
I'm trying to insert 50 million rows from hive table into a SQLServer table using SparkSQL JDBC Writer.Below is the line of code that I'm using to insert the data
mdf1.coalesce(4).write.mode(SaveMode.Append).jdbc(connectionString, "dbo.TEST_TABLE",…

sunny
- 11
- 2
1
vote
0 answers
How can I set the fetch size when getting results from Spark Thrift Server using JDBC?
I have tried using statement.setFetchSize(required number), but that works when I connect to Hive using JDBC and not when I try going through the spark thrift server. My query produces a large result set, causing a OOM on the thrift server.
Is there…

user9024779
- 87
- 8
1
vote
1 answer
Does df.write.jdbc handle JDBC pool connection?
Do you know if the following line can handle jdbc pool connection:
df.write
.mode("append")
.jdbc(url, table, prop)
Do you have any idea? Thanks

a.moussa
- 2,977
- 7
- 34
- 56
1
vote
1 answer
How to tune mapping/filtering on big datasets (cross joined from two datasets)?
Spark 2.2.0
I have the following code converted from SQL script. It has been running for two hours and it's still running. Even slower than SQL Server. Is anything not done correctly?
The following is the plan,
Push table2 to all executors…

ca9163d9
- 27,283
- 64
- 210
- 413