I'm using pandas in Databricks, with
import pyspark.pandas as ps
After reading two tables as a dataframe, df
and df_aux
, I'm executing the following line:
index_list = df.loc[~df['Column_A'].isin(df_aux)].index
But it raises the following error:
PandasNotImplementedError: The method `pd.Series.__iter__()` is not implemented. If you want to collect your data as an NumPy array, use 'to_numpy()' instead.
Any ideas on how to obtain the same variable index_list
using pyspark.pandas?