Highest Voted 'spark-jdbc' Questions

0

votes

1 answer

Optimal parameters to speed spark df.write to PostgreSQL

I am trying to write a Pyspark dataframe of ~3 millions rows x 158 columns (~3GB) to TimeScale DB. The write operation is executed from a Jupyter Kernel with the following ressources : 1 Driver, 2 vcpu, 2GB memory 2 Executors, 2 vcpu, 4GB…

asked Jan 18 '21 at 13:04

Flxnt

177
4
22

0

votes

1 answer

Read Data from Postgres in parallel using spark for a table without integer primary key column

I am working to read the data from the PostGres table containing 102 million records for the specific quarter. The table has data for multiple quarters. Right now I am reading the data through spark JDBC connector and it is taking too much time to…

postgresql apache-spark jdbc parallel-processing spark-jdbc

asked Nov 25 '20 at 15:00

Nikunj Kakadiya

2,689
2
20
35

0

votes

1 answer

spark jdbc - multiple connections to source?

someone mentioned that when we are using spark.read JDBC that generates a dataframe and afterwards if we do df.write twice on that dataframe. **DOES it creates two connections to the source ? ** I need some help with more insights on this…

dataframe apache-spark hadoop spark-jdbc

asked Sep 30 '20 at 16:13

satish lomte

1
2

0

votes

1 answer

How to parameterize writing dataframe into hive table

I have a list of tables (across different categories) in RBDMS that I want to extract and save in hive and I want to parameterize in such a way that I'll be able to attach the category name to the output location in hive. For example, I have a…

scala apache-spark apache-spark-sql spark-jdbc

asked Aug 03 '20 at 23:42

Ope Baba

65
8

0

votes

1 answer

How to write "all string" dataframe to Spark JDBC in Append mode to a target table with int and varchar columns

I create spark dataframe from csv file and try to insert it to rdbms table having integer and varchar columns. Since my dataframe is all string type its failing in "append" mode. If i use overwrite mode, rdbms table will be recreated with all…

apache-spark apache-spark-sql spark-jdbc

asked Jul 25 '20 at 15:33

Despicable me

548
1
9
24

0

votes

1 answer

Unable to DB2 COALESCE in Spark SQL

I have a table ENTITLE_USER from which I want to select User IDs if they are not null, else -1. For this I'm using COALESCE function of DB2. I'm reading my query inside Spark like this: val df1 = spark.read .format("jdbc") …

scala apache-spark db2 spark-jdbc

asked Jun 01 '20 at 07:40

Sparker0i

1,787
4
35
60

0

votes

1 answer

Is spark JDBC sink transactionally safe at node level?

I have a question related to opening a transaction at partition level. If I use jdbc connector to write to database (postgess), will partition specific writes at worker node be transactionally safe i.e. If a worker node goes down while writing the…

apache-spark jdbc apache-spark-sql databricks spark-jdbc

asked Feb 13 '20 at 10:31

Siddharth Tandon

21
3

0

votes

0 answers

spark createTableColumnTypes for TEXT data type in postgres table

In spark data set I'm using createTableColumnTypes for the database column data types to use instead of the defaults, when creating the table. It's working perfect for VARCHAR(n) but if I'm using TEXT it's throwing error. Code is written in…

java apache-spark apache-spark-sql spark-jdbc

asked Dec 23 '19 at 12:58

Dev

413
10
27

0

votes

1 answer

Is there any parameter partitioning when Spark reads RDBMS through JDBC?

When I run the spark application for table synchronization, the error message is as follows: 19/10/16 01:37:40 ERROR Executor: Exception in task 0.0 in stage 3.0 (TID 51) com.mysql.cj.jdbc.exceptions.CommunicationsException: Communications link…

apache-spark spark-jdbc

asked Oct 16 '19 at 03:24

Sin

101
2
8

0

votes

1 answer

Problem connecting SQL Server from Pyspark 2.4 to write data

I am using Pyspark 2.4, want to write data to SQL Server, which isn't working. I've placed the jar file downloaded from here in the spark path: D:\spark-2.4.3-bin-hadoop2.7\spark-2.4.3-bin-hadoop2.7\jars\ But, to no avail. Following is the pyspark…

sql-server apache-spark jdbc pyspark spark-jdbc

asked Sep 18 '19 at 07:20

Aakash Basu

1,689
7
28
57

0

votes

1 answer

Throwing NullPointerException while reading from MySQL in Apache Spark with Scala

I am trying to read data from MySQL but it is throwing NullPointerException. Not sure what is the reason. code in main.scala object main extends App { val dt = args.lift(0) if (dt.isEmpty || !PairingbatchUtil.validatePartitionDate(dt.get)) { …

scala apache-spark nullpointerexception spark-jdbc

asked Jul 04 '19 at 07:05

Parimal Roy

1
2

0

votes

1 answer

spark dataframe to Bigquery using simba driver

While trying to write a dataframe to Bigquery using Simba driver. am getting the below exception.below is the dataframe. Have created a table in bigquery with same schema. df.printSchema root |-- empid: integer (nullable = true) |-- firstname:…

apache-spark google-bigquery spark-jdbc

asked May 27 '19 at 18:34

Mohan

221
1
21

0

votes

0 answers

ERROR: column "blob" is of type jsonb but expression is of type character

val parquetDF = session.read.parquet("s3a://test/ovd").selectExpr("id", "topic", "update_id", "blob") Trying to read parquet file and dump into Postgres. One of the column in postgres table is of JSONB datatype and in parquet it is in String…

apache-spark-sql spark-jdbc

asked Mar 28 '19 at 20:51

mdev

1,366
17
23

0

votes

1 answer

How to fix initialization of Logger error while using spark-submit command

I've got the problem when run my spark-jdbc job to connect to another db. But I've got error before. Exception in thread "main" java.lang.AbstractMethodError at…

scala apache-spark spark-submit spark-jdbc

asked Mar 28 '19 at 11:21

Evgeniy Sobolev

93
6

0

votes

1 answer

Is there a way to define “partitionColumn” in “option(”partitionColumn“,”colname“)” in Spark-JDBC if the column is of datatype: String?

I am trying to load data from RDBMS to a hive table on HDFS. I am reading the RDBMS table in the below way: val mydata = spark.read .format("jdbc") .option("url", connection) .option("dbtable", "select * from dev.userlocations") …

apache-spark apache-spark-sql rdbms spark-jdbc

asked Sep 27 '18 at 05:45

Torque

99
3
16

Questions tagged [spark-jdbc]