Questions tagged [spark-jdbc]

78 questions
0
votes
0 answers

Unable to write the Spark Dataframe to a Cloud Spanner table using google spanner JDBC driver

Created a sample Dataframe with string data and trying to write that row into the google spanner table which is already created in the database(Google SQL dialect Spanner DB) Trying to write as below: data.write.format("jdbc") .option("url",…
0
votes
1 answer

Table gets deleted when trying to overwrite the data in it from databricks spark

I am trying to write dataframe data into a table in Azure SQL from Databricks using pyspark. Table : dbo.test already exists in the database. I am able to read it before I execute below write…
0
votes
1 answer

How can we write data to Azure synapse dedicated sql pool from Azure Databricks using a Service principal?

I am trying to use below code to write the data to synapse dedicated sql pool table. The Data is stored in ADLS Gen2 and I am trying to write a dataframe into a sql table I also have a service principal created for Azure Databricks that I am also…
0
votes
1 answer

How to check if a request located in JDBC_SESSION_INIT_STATEMENT is working? DataframeReader

I am trying to connect to sql server with spark-jdbc, using JDBC_SESSION_INIT_STATEMENT to create a temporary table and then download data from the temporary table in the main query. I have the following code: //df is…
Gumada Yaroslav
  • 115
  • 1
  • 10
0
votes
0 answers

Pyspark - Count after load while reading from JDBC connection on some jars taking more time

I am loading data from several databases namely redshift and snowflake into pyspark dataframe using the jdbc driver option. However, any action after that i.e count or group-by or join takes a long time in case of redshift but is very quick on the…
0
votes
1 answer

Pyspark Dataframe to AWS MySql: requirement failed: The driver could not open a JDBC connection

I want to write a pyspark dataframe into a MySQL table in AWS RDS, but I keep getting the error pyspark.sql.utils.IllegalArgumentException: requirement failed: The driver could not open a JDBC connection. Check the URL:…
Moritz
  • 495
  • 1
  • 7
  • 17
0
votes
0 answers

Spark conversion error on the executor JDBC SQLSERVER partition option

I'm first reading lower bound and upper bound using a: select max(timestamp) ,min(timestamp) from table name extracting Row row=query.collectasList().get(0).getString(0) as lowerbound and upperbound respectively then passing lowerbound and upper…
0
votes
1 answer

Getting py4j.protocol.Py4JJavaError: An error occurred while calling o65.jdbc. : java.sql.SQLException: Unsupported type TIMESTAMP_WITH_TIMEZONE

I am making JDBC connection to Denodo database using pyspark. The table that i am connecting to contains "TIMESTAMP_WITH_TIMEZONE" datatype for 2 columns. Since spark provides builtin jdbc connection to a handful of dbs only of which denodo is not a…
0
votes
1 answer

Why Spark JDBC infers table schema even when schema is specified?

I'm using spark.read.format("jdbc").option("query", tmpSql) to load a table from Mysql, and I can see a query select * from (xxx) where 1=0 from database monitor, later I know this query is used for inferring table schema in Spark. However when I…
JianZhang
  • 1
  • 1
0
votes
0 answers

spark jdbc api is giving error while accessing hive table with Map data type column

I have a hive table tableA with the following format: > desc tableA; +--------------------------+-----------------------+-----------------------+--+ | col_name | data_type | comment …
0
votes
1 answer

Spark JDBC - Read -> update -> write huge table without primary key

I am trying to update the few fields of each row of a big mysql table (having close to 500 million rows). The table doesn't have any primary key (or having string primary key like UUID). I don't have enough executor memory to read and hold the…
Sunny Gupta
  • 191
  • 1
  • 4
  • 14
0
votes
0 answers

Spark JDBC - Read/update/write huge table without int/long primary key

I am trying to update certain columns in a big MySQL table that does not have any primary key. How can I handle such big tables if it's size, e.g. 6GB, and my executor memory is only 2GB? Do you think, Spark-ODBC would help me somehow? If I would…
Sunny Gupta
  • 191
  • 1
  • 4
  • 14
0
votes
1 answer

Spark JDBC read API: Determining the number of partitions dynamically for a column of type datetime

I'm trying to read a table from an RDS MySQL instance using PySpark. It's a huge table, hence I want to parallelize the read operation by making use of the partitioning concept. The table doesn't have a numeric column to find the number of…
0
votes
1 answer

Pyspark - SQL server 2005 - SQL Exception 151

I'm facing an issue about fetching data from a database support on sql server2005 through pyspark. I have a table with 5 columns : - index -> int - category -> nvarchar - date_modified -> datetime format YYYY-MM-DD HH:MM:SS:SSS - category2 ->…
0
votes
1 answer

Spark JDBC Write to Teradata: multiple spark tasks failing with Transaction ABORTed due to deadlock error resulting in Stage failure

I am using spark JDBC write to load data from hive to teradata view. I am using 200 vcores and partitioned the data into 10000 partitions. Spark tasks are failing with the below error resulting in stage failure. Sometimes the application finishes…