0

I'm using pandas in Databricks, with

import pyspark.pandas as ps

After reading two tables as a dataframe, df and df_aux, I'm executing the following line:

index_list = df.loc[~df['Column_A'].isin(df_aux)].index

But it raises the following error:

PandasNotImplementedError: The method `pd.Series.__iter__()` is not implemented. If you want to collect your data as an NumPy array, use 'to_numpy()' instead.

Any ideas on how to obtain the same variable index_list using pyspark.pandas?

datadatadata
  • 119
  • 6

0 Answers0