2

I'm trying to load the query result from one table to another. It connects fine and execute a query to get the metadata but no data is returned.

from pyspark.sql import SQLContext, Row, SparkSession

spark = SparkSession.builder.config("spark.driver.extraClassPath", "C:\\spark\SQL\\sqljdbc_7.0\\enu\\mssql-jdbc-7.0.0.jre10.jar").getOrCreate()

SQL = "Select [InvoiceID],[CustomerID],[BillToCustomerID],[OrderID],[DeliveryMethodID],[ContactPersonID],[AccountsPersonID],[SalespersonPersonID],[PackedByPersonID],[InvoiceDate],[CustomerPurchaseOrderNumber],[IsCreditNote],[CreditNoteReason],[Comments],[DeliveryInstructions],[InternalComments],[TotalDryItems],[TotalChillerItems],[DeliveryRun],[RunPosition],[ReturnedDeliveryData],[ConfirmedDeliveryTime],[ConfirmedReceivedBy],[LastEditedBy],[LastEditedWhen] FROM [Sales].[Invoices]"

pgDF = spark.read \
    .format("jdbc") \
    .option("url", "jdbc:sqlserver://Localhost") \
    .option("query", SQL) \
    .option("user", "dp_admin") \
    .option("Database", "WideWorldImporters") \
    .option("password", "password") \
    .option("fetchsize", 1000) \
    .load(SQL)

pgDF.write \
.format("jdbc") \
.option("url", "jdbc:sqlserver://Localhost") \
.option("dbtable", "wwi.Sales_InvoiceLines") \
.option("user", "dp_admin") \
.option("Database", "DW_Staging") \
.option("password", "password") \
.option("mode", "overwrite")

Looking at the sql server profiler :

exec sp_executesql N'SELECT * FROM (Select [InvoiceID],[CustomerID],[BillToCustomerID],[OrderID],[DeliveryMethodID],[ContactPersonID],[AccountsPersonID],[SalespersonPersonID],[PackedByPersonID],[InvoiceDate],[CustomerPurchaseOrderNumber],[IsCreditNote],[CreditNoteReason],[Comments],[DeliveryInstructions],[InternalComments],[TotalDryItems],[TotalChillerItems],[DeliveryRun],[RunPosition],[ReturnedDeliveryData],[ConfirmedDeliveryTime],[ConfirmedReceivedBy],[LastEditedBy],[LastEditedWhen] FROM [Sales].[Invoices]) __SPARK_GEN_JDBC_SUBQUERY_NAME_0 WHERE 1=0'

the where 1= 0 gets added and no data is returned, why and how to remove it?

  • Hi, can you try without the option fetchsize please ? – Sanpas Jan 14 '19 at 21:23
  • removing the option fetchsize doesn't change the result - still the same where clause. Apparently the where clause is for spark to read the meta from the table and map the datatype to the dataframe. But I still don't why the dataframe is not populated. – Carsten Klausman Jan 15 '19 at 05:59
  • Have you try the select from SSMS ? I didn’t know spark but by removing step by step pieces of code you can see what doing wrong. Try to execute select 1,´test’ from query too . Your table Invoices it is not o schema (dbo) or other ? – Sanpas Jan 15 '19 at 06:13
  • I think you have to load data table and after perform a query. – Sanpas Jan 15 '19 at 06:21
  • 1
    Try to use .option(‘dbtable ´, query) in place to .option(‘query ´, query). Look at this https://stackoverflow.com/questions/52487007/execute-query-on-sqlserver-using-spark-sql – Sanpas Jan 15 '19 at 06:25
  • The query runs fine i SSMS. I've also tried the dbtable option - it still adds the where clause. If I put in a pgdf.Show(1000) it returns 1000 rows and it runs two queries on the sql server. First the empty and then a valid sql statement without the where clause. – Carsten Klausman Jan 15 '19 at 06:44

0 Answers0