0

The following code throws error:

X = "CREATE OR REPLACE TABLE Invoices (InvoiceID INT, CustomerID INT, BillToCustomerID INT, OrderID INT, DeliveryMethodID INT, ContactPersonID INT, AccountsPersonID INT, SalespersonPersonID INT, PackedByPersonID INT, InvoiceDate TIMESTAMP, CustomerPurchaseOrderNumber INT, IsCreditNote STRING, CreditNoteReason STRING, Comments STRING, DeliveryInstructions STRING, InternalComments STRING, TotalDryItems INT, TotalChillerItems STRING, DeliveryRun STRING, RunPosition STRING, ReturnedDeliveryData STRING, ConfirmedDeliveryTime TIMESTAMP, ConfirmedReceivedBy STRING, LastEditedBy INT, LastEditedWhen TIMESTAMP) LOCATION '/mnt/adls/DQD/udl/Invoices/'; ALTER TABLE Invoices ADD COLUMN DQ_Check_Op SMALLINT"

spark.sql(X)

enter image description here enter image description here

But, with magic command, inside a cell, it runs fine:

%sql
CREATE OR REPLACE TABLE Invoices (InvoiceID INT, CustomerID INT, BillToCustomerID INT, OrderID INT, DeliveryMethodID INT, ContactPersonID INT, AccountsPersonID INT, SalespersonPersonID INT, PackedByPersonID INT, InvoiceDate TIMESTAMP, CustomerPurchaseOrderNumber INT, IsCreditNote STRING, CreditNoteReason STRING, Comments STRING, DeliveryInstructions STRING, InternalComments STRING, TotalDryItems INT, TotalChillerItems STRING, DeliveryRun STRING, RunPosition STRING, ReturnedDeliveryData STRING, ConfirmedDeliveryTime TIMESTAMP, ConfirmedReceivedBy STRING, LastEditedBy INT, LastEditedWhen TIMESTAMP) LOCATION '/mnt/adls/DQD/udl/Invoices/'; ALTER TABLE Invoices ADD COLUMN DQ_Check_Op SMALLINT

enter image description here

What am I doing wrong here?

SouravA
  • 5,147
  • 2
  • 24
  • 49

2 Answers2

0

The reason you are getting a ParseException in spark.sql is because of the usage of multiple SQL query statements (CREATE and ALTER) with the help of ; . Pyspark is not reading this query in the same way as Databricks SQL. Databricks SQL is able to parse and execute multiple queries separated by a semicolon. But Pyspark throws a ParseException while reading any statement written after semicolon.

So, using a single query with spark.sql is recommended (single line query or multiline query) instead of using a single spark.sql to execute multiple SQL queries.

The workaround using Pyspark SQL would be to use one spark.sql for each SQL query. You can modify your code as following:

X = "CREATE OR REPLACE TABLE Invoices (InvoiceID INT, CustomerID INT, BillToCustomerID INT, OrderID INT, DeliveryMethodID INT, ContactPersonID INT, AccountsPersonID INT, SalespersonPersonID INT, PackedByPersonID INT, InvoiceDate TIMESTAMP, CustomerPurchaseOrderNumber INT, IsCreditNote STRING, CreditNoteReason STRING, Comments STRING, DeliveryInstructions STRING, InternalComments STRING, TotalDryItems INT, TotalChillerItems STRING, DeliveryRun STRING, RunPosition STRING, ReturnedDeliveryData STRING, ConfirmedDeliveryTime TIMESTAMP, ConfirmedReceivedBy STRING, LastEditedBy INT, LastEditedWhen TIMESTAMP) LOCATION '/mnt/repro/';"

Y = "ALTER TABLE Invoices ADD COLUMN DQ_Check_Op SMALLINT"

z = "SELECT * FROM Invoices"

spark.sql(X)
spark.sql(Y)
spark.sql(z)

enter image description here

Saideep Arikontham
  • 5,558
  • 2
  • 3
  • 11
  • 1
    That's not quite right. In my actual example, my table wasn't in the default database. After I wrapped my database and table name with ` (backtick), the error went off. I didn't think that's a relevant detail, so I didn't mention a non-default database. But, the premise of your answer is incorrect that you can't pass multiple SQL query statements to spark.sql. – SouravA Jun 07 '22 at 11:00
  • 1
    Hey @SouravA, the additional query after semicolon was causing the issue while I tried to reproduce it. Thank you for correcting me. Consider posting the answer in detail so it will be helpful. – Saideep Arikontham Jun 07 '22 at 11:32
0

In my actual example, my table wasn't in the default database - I didn't think that was a relevant detail. After I wrapped my database and table name with ` (backtick), the error went off.

So, the following helped:

CREATE TABLE `sch`.`Invoices` (InvoiceID INT, CustomerID INT, BillToCustomerID INT, OrderID INT, DeliveryMethodID INT, ContactPersonID INT, AccountsPersonID INT, SalespersonPersonID INT, PackedByPersonID INT, InvoiceDate TIMESTAMP, CustomerPurchaseOrderNumber INT, IsCreditNote STRING, CreditNoteReason STRING, Comments STRING, DeliveryInstructions STRING, InternalComments STRING, TotalDryItems INT, TotalChillerItems STRING, DeliveryRun STRING, RunPosition STRING, ReturnedDeliveryData STRING, ConfirmedDeliveryTime TIMESTAMP, ConfirmedReceivedBy STRING, LastEditedBy INT, LastEditedWhen TIMESTAMP) LOCATION '/mnt/repro/';ALTER TABLE `sch`.`Invoices` ADD COLUMN DQ_Check_Op SMALLINT"
SouravA
  • 5,147
  • 2
  • 24
  • 49