2

I need to round a column in PySpark using Banker's Rounding (where 0.5 is rounded to the nearest even number).

So far, I've tried this:

from pyspark.sql.functions round as _round

df = df.withColumn(new_name, col(old_name) * col('ExchangeRate'))
     
df = df.select("*", _round(col(new_name)))

Even if I'm running this in Python 3+, PySpark's rounding function will still apply the HALF_UP rounding method. I can't use Python's round() because it won't apply to a column object.

Is there a way to force PySpark's round() to use Banker's Rounding?

2 Answers2

2

You can use bround from pyspark.sql.functions. From the docs:

Round the given value to scale decimal places using HALF_EVEN rounding mode if scale >= 0 or at integral part when scale < 0.

import pyspark.sql.functions as F

df = spark.createDataFrame([(-1.5,), (-1.2,), (-0.6,), (-0.5,), (0.4,), (0.5,), (1.5,)], ["val"])

df.withColumn('round', F.round('val')).withColumn("bround", F.bround('val')).show()

+----+-----+------+
| val|round|bround|
+----+-----+------+
|-1.5| -2.0|  -2.0|
|-1.2| -1.0|  -1.0|
|-0.6| -1.0|  -1.0|
|-0.5| -1.0|   0.0|
| 0.4|  0.0|   0.0|
| 0.5|  1.0|   0.0|
| 1.5|  2.0|   2.0|
+----+-----+------+
ScootCork
  • 3,411
  • 12
  • 22
-1
import pyspark.sql.functions as func

df = spark.createDataFrame(
        [(0.0, 0.2, 3.45631),
         (0.4, 1.4, 2.82945),
         (0.5, 1.9, 7.76261),
         (0.6, 0.9, 2.76790),
         (1.2, 1.0, 9.87984)],
         ["col1", "col2", "col3"])


df2 = df.withColumn("col4", func.round(df["col3"]))
df2.show()

OUTPUT:

enter image description here

Mark Dickinson
  • 29,088
  • 9
  • 83
  • 120
SherKhan
  • 84
  • 1
  • 7
  • 1
    Could you add some words explaining how this forces `round` to use Banker's Rounding? If you add an item with value exactly `2.5` to column `col3`, what output does that give in `col4`? (The questioner is looking for a result of `2.0` in that case, but I suspect your code gives `3.0`.) – Mark Dickinson Jul 09 '22 at 10:21