I have two data frames like below.
df = spark.createDataFrame(sc.parallelize([[1,1,2],[1,2,9], [2,1,2],[2,2,1],
[4,1,5],[4,2,6]]), ["sid","cid","Cr"])
df.show()
+---+---+---+
|sid|cid| Cr|
+---+---+---+
| 1| 1| 2|
| 1| 2| 9|
| 2| 1| 2|
| 2| 2| 1|
| 4| 1| 5|
| 4| 2| 6|
| 5| 1| 3|
| 5| 2| 8|
+---+---+---+
next I have created df1 like below.
df1 = spark.createDataFrame(sc.parallelize([[1,1],[1,2],[1,3], [2,1],[2,2],[2,3],[4,1],[4,2],[4,3],[5,1],[5,2],[5,3]]), ["sid","cid"])
df1.show()
+---+---+
|sid|cid|
+---+---+
| 1| 1|
| 1| 2|
| 1| 3|
| 2| 1|
| 2| 2|
| 2| 3|
| 4| 1|
| 4| 2|
| 4| 3|
| 5| 1|
| 5| 2|
| 5| 3|
+---+---+
now I want my final output should be like below i.e . if any of the data presented i.e. if (df1.sid==df.sid)&(df1.cid==df.cid) then flag value 1 else 0. and missing Cr values will be '0'
+---+---+---+----+
|sid|cid| Cr|flag|
+---+---+---+----+
| 1| 1| 2| 1 |
| 1| 2| 9| 1 |
| 1| 3| 0| 0 |
| 2| 1| 2| 1 |
| 2| 2| 1| 1 |
| 2| 3| 0| 0 |
| 4| 1| 5| 1 |
| 4| 2| 6| 1 |
| 4| 3| 0| 0 |
| 5| 1| 3| 1 |
| 5| 2| 8| 1 |
| 5| 3| 0| 0 |
+---+---+---+----+
please help me on this.