0

Assume these two pyspark dataframes:

dfA

id
1
2
3
4

dfB

src,dst
2  ,3
1  ,3
3  ,4
4  ,1
7  ,3
1  ,8

How can I get this desired output:

resultDf

src,dst
2  ,3
1  ,3
3  ,4
4  ,1

Basically I want to select Rows from dfB that contain a value of dfA

lumi
  • 99
  • 1
  • 9

1 Answers1

1

I was able to get the desired result using spark.sql

resultDf = spark.sql("SELECT * FROM dfA WHERE dfB.src IN (SELECT * FROM dfA) AND dfB.dst IN (SELECT * FROM dfA)")
lumi
  • 99
  • 1
  • 9