Select entries from TABLE A based on TABLE B pyspark

Question

In SQL we are able to select entries from a TABLE A based on a column in TABLE B.

Please see below:

SELECT * FROM TABLE A 
WHERE NAME in (SELECT NAME FROM TABLE B)

How do I replicate this piece of code in pyspark without using a sql context?

Does this answer your question? [PySpark: match the values of a DataFrame column against another DataFrame column](https://stackoverflow.com/questions/42545788/pyspark-match-the-values-of-a-dataframe-column-against-another-dataframe-column) — polkas, Oct 11 '20 at 17:48

Aditya Vikram Singh · Answer 1 · 2020-10-12T04:14:52.837

0

Df=A.join(B, on =['Name'], how ='inner' ).select(A.columns)

edited Oct 12 '20 at 04:14

answered Oct 11 '20 at 20:13

This question is about Pyspark, not about Pandas – werner Oct 11 '20 at 21:08
1

It's a pyspark code , please check the documentation - https://dzone.com/articles/pyspark-join-explained-with-examples – Aditya Vikram Singh Oct 12 '20 at 04:16
yep, you have now changed your answer to use the Spark syntax – werner Oct 12 '20 at 18:23

1 Answers1