1

I have written this except query to get difference in record from both hive tables from databricks notebook.(I am trying to get result as we get in mssql ie only difference in resultset)

select PreqinContactID,PreqinContactName,PreqinPersonTitle,EMail,City 
  from preqin_7dec.PreqinContact where filename='InvestorContactPD.csv'
except
  select CONTACT_ID,NAME,JOB_TITLE,EMAIL,CITY 
  from preqinct.InvestorContactPD where contact_id in (
    select PreqinContactID from preqin_7dec.PreqinContact 
    where filename='InvestorContactPD.csv')

enter image description here

But the result set returned is also having matching records.The record which i have shown above is coming in result set but when i checked it separately based on contact_id it is same.so I am not sure why except is returning the matching record also.

Just wanted to know how we can use except or any difference finding command in databrick notebook by using sql.

I want to see nothing in result set if source and target data is same.

Alex Ott
  • 80,552
  • 8
  • 87
  • 132
Avinash Singh
  • 63
  • 1
  • 12

1 Answers1

1

EXCEPT works perfectly well in Databricks as this simple test will show:

val df = Seq((3445256, "Avinash Singh", "Chief Manager", "asingh@gmail.com", "Mumbai"))
  .toDF("contact_id", "name", "job_title", "email", "city")


// Save the dataframe to a temp view
df.createOrReplaceTempView("tmp")
df.show

The SQL test:

%sql
SELECT *
FROM tmp
EXCEPT 
SELECT *
FROM tmp;

This query will yield no results. Is it possible you have some leading or trailing spaces for example? Spark is also case-sensitive so that could also be causing your issue. Try a case-insensitive test by applying the LOWER function to all columns, eg

Results

wBob
  • 13,710
  • 3
  • 20
  • 37