Questions tagged [spark-jdbc]

78 questions
0
votes
1 answer

Optimal parameters to speed spark df.write to PostgreSQL

I am trying to write a Pyspark dataframe of ~3 millions rows x 158 columns (~3GB) to TimeScale DB. The write operation is executed from a Jupyter Kernel with the following ressources : 1 Driver, 2 vcpu, 2GB memory 2 Executors, 2 vcpu, 4GB…
Flxnt
  • 177
  • 4
  • 22
0
votes
1 answer

Read Data from Postgres in parallel using spark for a table without integer primary key column

I am working to read the data from the PostGres table containing 102 million records for the specific quarter. The table has data for multiple quarters. Right now I am reading the data through spark JDBC connector and it is taking too much time to…
0
votes
1 answer

spark jdbc - multiple connections to source?

someone mentioned that when we are using spark.read JDBC that generates a dataframe and afterwards if we do df.write twice on that dataframe. **DOES it creates two connections to the source ? ** I need some help with more insights on this…
0
votes
1 answer

How to parameterize writing dataframe into hive table

I have a list of tables (across different categories) in RBDMS that I want to extract and save in hive and I want to parameterize in such a way that I'll be able to attach the category name to the output location in hive. For example, I have a…
Ope Baba
  • 65
  • 8
0
votes
1 answer

How to write "all string" dataframe to Spark JDBC in Append mode to a target table with int and varchar columns

I create spark dataframe from csv file and try to insert it to rdbms table having integer and varchar columns. Since my dataframe is all string type its failing in "append" mode. If i use overwrite mode, rdbms table will be recreated with all…
Despicable me
  • 548
  • 1
  • 9
  • 24
0
votes
1 answer

Unable to DB2 COALESCE in Spark SQL

I have a table ENTITLE_USER from which I want to select User IDs if they are not null, else -1. For this I'm using COALESCE function of DB2. I'm reading my query inside Spark like this: val df1 = spark.read .format("jdbc") …
Sparker0i
  • 1,787
  • 4
  • 35
  • 60
0
votes
1 answer

Is spark JDBC sink transactionally safe at node level?

I have a question related to opening a transaction at partition level. If I use jdbc connector to write to database (postgess), will partition specific writes at worker node be transactionally safe i.e. If a worker node goes down while writing the…
0
votes
0 answers

spark createTableColumnTypes for TEXT data type in postgres table

In spark data set I'm using createTableColumnTypes for the database column data types to use instead of the defaults, when creating the table. It's working perfect for VARCHAR(n) but if I'm using TEXT it's throwing error. Code is written in…
Dev
  • 413
  • 10
  • 27
0
votes
1 answer

Is there any parameter partitioning when Spark reads RDBMS through JDBC?

When I run the spark application for table synchronization, the error message is as follows: 19/10/16 01:37:40 ERROR Executor: Exception in task 0.0 in stage 3.0 (TID 51) com.mysql.cj.jdbc.exceptions.CommunicationsException: Communications link…
Sin
  • 101
  • 2
  • 8
0
votes
1 answer

Problem connecting SQL Server from Pyspark 2.4 to write data

I am using Pyspark 2.4, want to write data to SQL Server, which isn't working. I've placed the jar file downloaded from here in the spark path: D:\spark-2.4.3-bin-hadoop2.7\spark-2.4.3-bin-hadoop2.7\jars\ But, to no avail. Following is the pyspark…
Aakash Basu
  • 1,689
  • 7
  • 28
  • 57
0
votes
1 answer

Throwing NullPointerException while reading from MySQL in Apache Spark with Scala

I am trying to read data from MySQL but it is throwing NullPointerException. Not sure what is the reason. code in main.scala object main extends App { val dt = args.lift(0) if (dt.isEmpty || !PairingbatchUtil.validatePartitionDate(dt.get)) { …
0
votes
1 answer

spark dataframe to Bigquery using simba driver

While trying to write a dataframe to Bigquery using Simba driver. am getting the below exception.below is the dataframe. Have created a table in bigquery with same schema. df.printSchema root |-- empid: integer (nullable = true) |-- firstname:…
Mohan
  • 221
  • 1
  • 21
0
votes
0 answers

ERROR: column "blob" is of type jsonb but expression is of type character

val parquetDF = session.read.parquet("s3a://test/ovd").selectExpr("id", "topic", "update_id", "blob") Trying to read parquet file and dump into Postgres. One of the column in postgres table is of JSONB datatype and in parquet it is in String…
mdev
  • 1,366
  • 17
  • 23
0
votes
1 answer

How to fix initialization of Logger error while using spark-submit command

I've got the problem when run my spark-jdbc job to connect to another db. But I've got error before. Exception in thread "main" java.lang.AbstractMethodError at…
0
votes
1 answer

Is there a way to define “partitionColumn” in “option(”partitionColumn“,”colname“)” in Spark-JDBC if the column is of datatype: String?

I am trying to load data from RDBMS to a hive table on HDFS. I am reading the RDBMS table in the below way: val mydata = spark.read .format("jdbc") .option("url", connection) .option("dbtable", "select * from dev.userlocations") …
Torque
  • 99
  • 3
  • 16