PandasNotImplementedError in Databricks

Asked Apr 27 '23 at 11:02

Active Jul 13 '23 at 23:55

Viewed 52 times

I'm using pandas in Databricks, with

import pyspark.pandas as ps

After reading two tables as a dataframe, df and df_aux, I'm executing the following line:

index_list = df.loc[~df['Column_A'].isin(df_aux)].index

But it raises the following error:

PandasNotImplementedError: The method `pd.Series.__iter__()` is not implemented. If you want to collect your data as an NumPy array, use 'to_numpy()' instead.

Any ideas on how to obtain the same variable index_list using pyspark.pandas?

asked Apr 27 '23 at 11:02

datadatadata

PandasNotImplementedError in Databricks

0 Answers0