Highest Voted 'spark-jdbc' Questions

1

vote

1 answer

Spark does not push the filter down to PostgreSQL data source when reading data in parallel providing values for lower bound and upper bound

I am trying to read the data from the PostgreSQL table in parallel. I am using the timestamp column as the partition column and providing the values for the lower bound, upper bound and numPartitions. It is creating multiple queries to read data in…

asked Feb 01 '21 at 06:36

Nikunj Kakadiya

2,689
2
20
35

1

vote

1 answer

I want Spark to ignore bad records while saving into database

I am saving the Rows in database using spark JDBC. The saving of the data works fine. Issue: Spark aborts saving if it encounters any bad records (e.g. a column with null values when table is expecting the non-null value) What I want: I want Spark…

apache-spark spark-jdbc

asked Dec 03 '20 at 13:49

nav

148
2
5
13

1

vote

2 answers

How many connections to the database from Spark while writing a dataframe?

I'm confused how many connections would be made to the database by Spark in the below scenario: Let's say I have a Spark program which is running only on one worker node with one executor, and the number of partitions in a dataframe is 10. I want to…

apache-spark apache-spark-sql spark-jdbc

asked Nov 04 '20 at 19:22

amit kumar

55
1
7

1

vote

1 answer

Spark jdbc read performance tuning with no primary key column

I am running a spark analytics application and reading MSSQL Server table (whole table) directly using spark jdbc. Thes table have more than 30M records but don't have any primary key column or integer column. Since the table don't have such column…

scala apache-spark apache-spark-sql spark-jdbc

asked Sep 26 '19 at 08:23

Sandeep Singh

7,790
4
43
68

1

vote

0 answers

Is it necessary to add SaveMode Delete, Update and Upsert for Spark jdbc source?

Do you think it is necessary to add SaveMode for Delete, Update, and Upsert? Such as: SaveMode.Delete SaveMode.Update SaveMode.Upsert referring to the code: JdbcRelationProvider.scala I've analyzed its code for SaveTable: JdbcUtils.scala, and…

scala apache-spark spark-jdbc

asked Sep 16 '19 at 16:00

timothyzhang

730
9
12

1

vote

0 answers

Instrumenting Spark JDBC with javaagent

I am attempting to instrument JDBC calls using the Kamon JDBC Kanela agent in my Spark app. I am able to successfully instrument JDBC calls in a non-spark test app by passing in -javaagent:kanela-agent-1.0.1.jar on the command line when I run the…

apache-spark javaagents kamon spark-jdbc

asked Sep 12 '19 at 14:49

JoeMjr2

3,804
4
34
62

1

vote

1 answer

Calculate lower and upper bounds for partition Spark JDBC

I read the data from MS SQL server using Spark-jdbc with Scala and I would like to partition this data by the specified column. I do not want to set lower and upper bounds for the partition column manually. Can I read some kind of maximum and…

sql-server scala apache-spark spark-jdbc

asked Apr 08 '19 at 10:15

Cassie

2,941
8
44
92

1

vote

0 answers

Spark JDBC and transaction pooling in PGBouncer

I am using Spark JDBC DataFramReader to query Postgres DB, the query is executed threw PGBouncer working in Transaction Pooling. From the second executed query I receive the following error: org.postgresql.util.PSQLException: ERROR: prepared…

postgresql apache-spark jdbc pgbouncer spark-jdbc

asked Mar 07 '19 at 09:08

Alex Stanovsky

1,286
1
13
28

1

vote

1 answer

spark jdbc read tuning where table without primary key

I am reading 30M records from oracle table with no primary key columns. spark jdbc reading hangs and not fetching any data. where i can get the result from Oracle SQLDeveloper within few seconds for same query. oracleDf =…

apache-spark-sql spark-jdbc

asked Sep 21 '18 at 14:20

Ramakrishna

1,170
2
10
17

1

vote

1 answer

pySpark jdbc write error: An error occurred while calling o43.jdbc. : scala.MatchError: null

I am trying to write simple spark dataframe to db2 database using pySpark. Dataframe has only one column with double as a data type. This is the dataframe with only one row and one column: This is the dataframe schema: When I try to write this…

pyspark db2 apache-spark-sql spark-jdbc

asked Mar 20 '18 at 18:41

vaibhav sapkal

11
2

1

vote

0 answers

Processing a huge database table with Spark

I have a huge database table which contains millions of records. Each record can be processed in isolation and it has to be converted in, let's say, a string. So I started looking around and I was wondering if Spark could help me in this scenario.…

apache-spark jdbc spark-jdbc

asked Dec 21 '17 at 12:12

Andrea

2,714
3
27
38

1

vote

0 answers

SparkSQL JDBC writer fails with "Cannot acquire locks error"

I'm trying to insert 50 million rows from hive table into a SQLServer table using SparkSQL JDBC Writer.Below is the line of code that I'm using to insert the data mdf1.coalesce(4).write.mode(SaveMode.Append).jdbc(connectionString, "dbo.TEST_TABLE",…

hive apache-spark-sql sql-server-2016 apache-spark-1.6 spark-jdbc

asked Dec 02 '17 at 05:14

sunny

11
2

1

vote

0 answers

How can I set the fetch size when getting results from Spark Thrift Server using JDBC?

I have tried using statement.setFetchSize(required number), but that works when I connect to Hive using JDBC and not when I try going through the spark thrift server. My query produces a large result set, causing a OOM on the thrift server. Is there…

apache-spark hive spark-thriftserver spark-jdbc

asked Dec 01 '17 at 05:10

user9024779

87
8

1

vote

1 answer

Does df.write.jdbc handle JDBC pool connection?

Do you know if the following line can handle jdbc pool connection: df.write .mode("append") .jdbc(url, table, prop) Do you have any idea? Thanks

scala apache-spark apache-spark-sql spark-jdbc

asked Jul 19 '17 at 15:01

a.moussa

2,977
7
34
56

1

vote

1 answer

How to tune mapping/filtering on big datasets (cross joined from two datasets)?

Spark 2.2.0 I have the following code converted from SQL script. It has been running for two hours and it's still running. Even slower than SQL Server. Is anything not done correctly? The following is the plan, Push table2 to all executors…

scala apache-spark apache-spark-sql spark-jdbc

asked Jul 19 '17 at 05:50

ca9163d9

27,283
64
210
413

Questions tagged [spark-jdbc]