I have a two table/dataframe: A
and B
A has following columns: cust_id, purch_date
B has one column: cust_id, col1
(col1 is not needed)
Following sample shows content of each table:
Table A
cust_id purch_date
34564 2017-08-21
34564 2017-08-02
34564 2017-07-21
23847 2017-09-13
23423 2017-06-19
Table B
cust_id col1
23442 x
12452 x
12464 x
23847 x
24354 x
I want to select the cust_id
and first day of month of purch_date
where the selected cust_id
are not there in B
.
This can be achieved in SQL by following command:
select a.cust_id, trunc(purch_date, 'MM') as mon
from a
left join b
on a.cust_id = b.cust_id
where b.cust_id is null
group by cust_id, mon;
Following will be the output:
Table A
cust_id purch_date
34564 2017-08-01
34564 2017-07-01
23423 2017-06-01
I tried the following to implement the same in Scala:
import org.apache.spark.sql.functions._
a = spark.sql("select * from db.a")
b = spark.sql("select * from db.b")
var out = a.join(b, Seq("cust_id"), "left")
.filter("col1 is null")
.select("cust_id", trunc("purch_date", "month"))
.distinct()
But I am getting different errors like:
error: type mismatch; found: StringContext required: ?{def $: ?}
I am stuck here and couldn't find enough documentation/answers on net.