Except command not working in databrick sql (spark sql)

Question

I have written this except query to get difference in record from both hive tables from databricks notebook.(I am trying to get result as we get in mssql ie only difference in resultset)

select PreqinContactID,PreqinContactName,PreqinPersonTitle,EMail,City 
  from preqin_7dec.PreqinContact where filename='InvestorContactPD.csv'
except
  select CONTACT_ID,NAME,JOB_TITLE,EMAIL,CITY 
  from preqinct.InvestorContactPD where contact_id in (
    select PreqinContactID from preqin_7dec.PreqinContact 
    where filename='InvestorContactPD.csv')

But the result set returned is also having matching records.The record which i have shown above is coming in result set but when i checked it separately based on contact_id it is same.so I am not sure why except is returning the matching record also.

Just wanted to know how we can use except or any difference finding command in databrick notebook by using sql.

I want to see nothing in result set if source and target data is same.

score 1 · Accepted Answer · answered Dec 29 '20 at 22:25

1

EXCEPT works perfectly well in Databricks as this simple test will show:

val df = Seq((3445256, "Avinash Singh", "Chief Manager", "asingh@gmail.com", "Mumbai"))
  .toDF("contact_id", "name", "job_title", "email", "city")


// Save the dataframe to a temp view
df.createOrReplaceTempView("tmp")
df.show

The SQL test:

%sql
SELECT *
FROM tmp
EXCEPT 
SELECT *
FROM tmp;

This query will yield no results. Is it possible you have some leading or trailing spaces for example? Spark is also case-sensitive so that could also be causing your issue. Try a case-insensitive test by applying the LOWER function to all columns, eg

answered Dec 29 '20 at 22:25

wBob

13,710
3
20
37

For column name , do we need to keep the same column name by using alias. or we can keep anything with correct sequence. – Avinash Singh Jan 05 '21 at 08:11
like - select contactid,name from tablea except select idno,clientname from tableb;can it work? – Avinash Singh Jan 05 '21 at 08:13
Column names do not need to be the same. – wBob Jan 05 '21 at 08:16
Ok Then. Let me try as after trying lower case again i am getting matched records in difference. – Avinash Singh Jan 08 '21 at 15:03

Except command not working in databrick sql (spark sql)

1 Answers1