I have the two pyspark dataframes. I want to select all records from voutdf where its "hash" does not exist in vindf.tx_hash
How to do this using pyspark dataframe.? I tried a semi join but I am ending up with out of memory errors.
voutdf = sqlContext.createDataFrame(voutRDD,["hash", "value","n","pubkey"])
vindf = sqlContext.createDataFrame(vinRDD,["txid", "tx_hash","vout"])