Highest Voted 'spark-jdbc' Questions

0

votes

0 answers

Unable to write the Spark Dataframe to a Cloud Spanner table using google spanner JDBC driver

Created a sample Dataframe with string data and trying to write that row into the google spanner table which is already created in the database(Google SQL dialect Spanner DB) Trying to write as below: data.write.format("jdbc") .option("url",…

asked Aug 22 '23 at 11:40

user22427575

1
2

0

votes

1 answer

Table gets deleted when trying to overwrite the data in it from databricks spark

I am trying to write dataframe data into a table in Azure SQL from Databricks using pyspark. Table : dbo.test already exists in the database. I am able to read it before I execute below write…

sql-server pyspark apache-spark-sql azure-sql-database spark-jdbc

asked Jun 16 '23 at 04:11

Ramaraju.d

1,301
6
26
46

0

votes

1 answer

How can we write data to Azure synapse dedicated sql pool from Azure Databricks using a Service principal?

I am trying to use below code to write the data to synapse dedicated sql pool table. The Data is stored in ADLS Gen2 and I am trying to write a dataframe into a sql table I also have a service principal created for Azure Databricks that I am also…

pyspark azure-databricks azure-synapse spark-jdbc

asked May 10 '23 at 18:51

azuresnowflake1

135
1
10

0

votes

1 answer

How to check if a request located in JDBC_SESSION_INIT_STATEMENT is working? DataframeReader

I am trying to connect to sql server with spark-jdbc, using JDBC_SESSION_INIT_STATEMENT to create a temporary table and then download data from the temporary table in the main query. I have the following code: //df is…

sql-server apache-spark spark-jdbc

asked Jan 23 '23 at 09:56

Gumada Yaroslav

115
1
10

0

votes

0 answers

Pyspark - Count after load while reading from JDBC connection on some jars taking more time

I am loading data from several databases namely redshift and snowflake into pyspark dataframe using the jdbc driver option. However, any action after that i.e count or group-by or join takes a long time in case of redshift but is very quick on the…

apache-spark pyspark apache-spark-sql amazon-redshift spark-jdbc

asked Dec 05 '22 at 10:09

RahulNans

47
8

0

votes

1 answer

Pyspark Dataframe to AWS MySql: requirement failed: The driver could not open a JDBC connection

I want to write a pyspark dataframe into a MySQL table in AWS RDS, but I keep getting the error pyspark.sql.utils.IllegalArgumentException: requirement failed: The driver could not open a JDBC connection. Check the URL:…

apache-spark pyspark amazon-rds mysql-connector spark-jdbc

asked Feb 13 '22 at 20:50

Moritz

495
1
7
17

0

votes

0 answers

Spark conversion error on the executor JDBC SQLSERVER partition option

I'm first reading lower bound and upper bound using a: select max(timestamp) ,min(timestamp) from table name extracting Row row=query.collectasList().get(0).getString(0) as lowerbound and upperbound respectively then passing lowerbound and upper…

sql-server apache-spark apache-spark-sql spark-java spark-jdbc

asked Feb 09 '22 at 22:17

user2515163

41
2
8

0

votes

1 answer

Getting py4j.protocol.Py4JJavaError: An error occurred while calling o65.jdbc. : java.sql.SQLException: Unsupported type TIMESTAMP_WITH_TIMEZONE

I am making JDBC connection to Denodo database using pyspark. The table that i am connecting to contains "TIMESTAMP_WITH_TIMEZONE" datatype for 2 columns. Since spark provides builtin jdbc connection to a handful of dbs only of which denodo is not a…

apache-spark jdbc pyspark apache-spark-sql spark-jdbc

asked Feb 03 '22 at 10:03

Gaurav Gupta

159
1
17

0

votes

1 answer

Why Spark JDBC infers table schema even when schema is specified?

I'm using spark.read.format("jdbc").option("query", tmpSql) to load a table from Mysql, and I can see a query select * from (xxx) where 1=0 from database monitor, later I know this query is used for inferring table schema in Spark. However when I…

mysql apache-spark spark-jdbc

asked Dec 09 '21 at 01:35

JianZhang

1
1

0

votes

0 answers

spark jdbc api is giving error while accessing hive table with Map data type column

I have a hive table tableA with the following format: > desc tableA; +--------------------------+-----------------------+-----------------------+--+ | col_name | data_type | comment …

apache-spark hive data-migration spark-jdbc

asked Sep 27 '21 at 08:43

Narendra Rokade

11
1

0

votes

1 answer

Spark JDBC - Read -> update -> write huge table without primary key

I am trying to update the few fields of each row of a big mysql table (having close to 500 million rows). The table doesn't have any primary key (or having string primary key like UUID). I don't have enough executor memory to read and hold the…

apache-spark apache-spark-sql spark-jdbc

asked Apr 16 '21 at 22:11

Sunny Gupta

191
1
4
14

0

votes

0 answers

Spark JDBC - Read/update/write huge table without int/long primary key

I am trying to update certain columns in a big MySQL table that does not have any primary key. How can I handle such big tables if it's size, e.g. 6GB, and my executor memory is only 2GB? Do you think, Spark-ODBC would help me somehow? If I would…

apache-spark jdbc spark-jdbc

asked Apr 15 '21 at 23:17

Sunny Gupta

191
1
4
14

0

votes

1 answer

Spark JDBC read API: Determining the number of partitions dynamically for a column of type datetime

I'm trying to read a table from an RDS MySQL instance using PySpark. It's a huge table, hence I want to parallelize the read operation by making use of the partitioning concept. The table doesn't have a numeric column to find the number of…

apache-spark pyspark apache-spark-sql partitioning spark-jdbc

asked Apr 12 '21 at 10:54

Arjun A J

396
1
9
34

0

votes

1 answer

Pyspark - SQL server 2005 - SQL Exception 151

I'm facing an issue about fetching data from a database support on sql server2005 through pyspark. I have a table with 5 columns : - index -> int - category -> nvarchar - date_modified -> datetime format YYYY-MM-DD HH:MM:SS:SSS - category2 ->…

sql-server database dataframe pyspark spark-jdbc

asked Feb 19 '21 at 11:52

Pierre Lhostis

1
1

0

votes

1 answer

Spark JDBC Write to Teradata: multiple spark tasks failing with Transaction ABORTed due to deadlock error resulting in Stage failure

I am using spark JDBC write to load data from hive to teradata view. I am using 200 vcores and partitioned the data into 10000 partitions. Spark tasks are failing with the below error resulting in stage failure. Sometimes the application finishes…

apache-spark-sql teradata deadlock spark-jdbc

asked Jan 28 '21 at 20:05

Shruthi Br

1
1

Questions tagged [spark-jdbc]