1

I am trying to replace pandas library with pyspark.pandas library.

I tried this : NOTE : df is pyspark.pandas dataframe

import pyspark.pandas as pd 
print(set(df["horizon"].unique()))

But got the below error :

   print(set(df["horizon"].unique()))
  File "C:\Users\abc\Anaconda3\envs\env1\lib\site-packages\pyspark\pandas\series.py", line 6328, in __iter__
    return MissingPandasLikeSeries.__iter__(self)
  File "C:\Users\abc\Anaconda3\envs\env1\lib\site-packages\pyspark\pandas\missing\__init__.py", line 24, in unsupported_function
    class_name=class_name, method_name=method_name, reason=reason
pyspark.pandas.exceptions.PandasNotImplementedError: The method `pd.Series.__iter__()` is not implemented. If you want to collect your data as an NumPy array, use 'to_numpy()' instead.
user19930511
  • 299
  • 2
  • 15

0 Answers0