1

I have a Spark DF that I've converted to a pySpark Pandas DF using df.to_pandas_on_spark().

I have some logic that rounds a column i.e:

 df[Column_Name] = round(df[Income] - df[Fees],2)

but get the following error

TypeError: type Series doesn't define round method

After searching around it seems like historically (with Koalas) the fix was to use pySpark's SQL functions:

import pyspark.sql.functions as f

df[Column_Name] = f.round(df[Income] - df[Fees],2)

Is there no import pyspark.pandas round function equivalent???

Because when I use pyspark.sql.functions round I get the following error:

Invalid argument, not a string or column: 57 0.000000

Not sure why it gives me a "not a string" error, do people round strings? I would expect at the very least it outputted "not a float/int/number etc..."

All of this logic works fine when I'm using solely Pandas (not to confuse this with pySpark.Pandas).

mikelowry
  • 1,307
  • 4
  • 21
  • 43

0 Answers0