I have a SQL query like below:
select col4, col5 from TableA where col1 = 'x'
intersect
select col4, col5 from TableA where col1 = 'y'
intersect
select col4, col5 from TableA where col1 = 'z'
How can I convert this SQL to PySpark equivalent? I can create 3 DF and then do intersect like below:
df1 ==> select col4, col5 from TableA where col1 = 'x'
df2 ==> select col4, col5 from TableA where col1 = 'y'
df3 ==> select col4, col5 from TableA where col1 = 'z'
df_result = df1.intersect(df2)
df_result = df_result.intersect(df3)
But I feel that's not good approach to follow if I had more intersect
queries.
Also, let's say [x,y,z] is dynamic, means it can be like [x,y,z,a,b,.....]
Any suggestion?