I'm new to PySpark and need to compare two files based on col1 alone and populate new colum at end of file 1 based on matching conditions.
1 - Matching record 0 - Unmatached Record
File1:
Col1 | Col2 | ... | ColN |
---|---|---|---|
1 | abc | ... | Xxxx |
2 | abc | ... | Xxxx |
3 | abc | ... | Xxxx |
File 2
Col1 | Col2 | ... | ColN |
---|---|---|---|
1 | abc | ... | Xxxx |
2 | abc | ... | Xxxx |
Expected output:
Col1 | Col2 | ... | ColN | Newcol |
---|---|---|---|---|
1 | abc | ... | Xxxx | 1 |
2 | abc | ... | Xxxx | 1 |
3 | abc | ... | Xxxx | 0 |