I'm new to Spark world and I would like to calculate an extra column with integers modulo in Pyspark. I have not find this operator in build in operators.
Does anyone have any idea?
I'm new to Spark world and I would like to calculate an extra column with integers modulo in Pyspark. I have not find this operator in build in operators.
Does anyone have any idea?
You can simply use the %
operator between columns, as you would in normal python:
from pyspark.sql.functions import col
df = spark.createDataFrame([(6,3), (7, 3), (13,6), (5, 0)], ["x", "y"])
df.withColumn("mod", col("x") % col("y")).show()
#+---+---+----+
#| x| y| mod|
#+---+---+----+
#| 6| 3| 0|
#| 7| 3| 1|
#| 13| 6| 1|
#| 5| 0|null|
#+---+---+----+
Alternatively, you can use the spark built-in function mod
or %
operator with SQL syntax:
from pyspark.sql.functions import expr
# using mod function
df.withColumn("mod", expr("mod(x, y)")).show()
# using SQL %
df.withColumn("mod", expr("x % y")).show()