2

Here is my sample code. I am expecting decimal(16,4) as return type from the UDF, but it is decimal(38,18).

Is there any better solution?

I am NOT expecting the answer "cast(price as decimal(16,4))", as I have some other business logic in my UDF other than just casting.

Thanks in advance.

import scala.util.Try
import org.apache.spark.sql.functions.udf
import org.apache.spark.sql.types.Decimal
val spark = SparkSession.builder().master("local[*]").appName("Test").getOrCreate()
import spark.implicits._

val stringToDecimal = udf((s:String, precision:Int, scale: Int) => {
  Try(Decimal(BigDecimal(s), precision, scale)).toOption
})

spark.udf.register("stringToDecimal", stringToDecimal)

val inDf = Seq(
  ("1", "864.412"),
  ("2", "1.600"),
  ("3", "2,56")).toDF("id", "price")

val outDf = inDf.selectExpr("id", "stringToDecimal(price, 16, 4) as price")
outDf.printSchema()
outDf.show()

------------------output----------------
root
  |-- id: string (nullable = true)
  |-- price: decimal(38,18) (nullable = true)

+---+--------------------+
| id|               price|
+---+--------------------+
|  1|864.4120000000000...|
|  2|1.600000000000000000|
|  3|                null|
+---+--------------------+
Matta
  • 43
  • 8

2 Answers2

3

As for Spark 3.0 and below, you can't set precision and scale in decimal returned by a Spark user defined function (UDF) as the precision and scale are erased at UDF's creation.

Explanation

To create an UDF, either by calling function udf with a lambda/function as argument or by directly registering the lambda/function as UDF by using sparkSession.udf.register method, Spark needs to convert arguments types and returns type of the lambda/function to Spark's DataType

To do so, Spark will use method schemaFor in class ScalaReflection to map scala types to Spark's DataType.

For the BigDecimal or the Decimal type, the mapping is done as follow:

case t if isSubtype(t, localTypeOf[BigDecimal]) =>
  Schema(DecimalType.SYSTEM_DEFAULT, nullable = true)
case t if isSubtype(t, localTypeOf[java.math.BigDecimal]) =>
  Schema(DecimalType.SYSTEM_DEFAULT, nullable = true)
case t if isSubtype(t, localTypeOf[Decimal]) =>
  Schema(DecimalType.SYSTEM_DEFAULT, nullable = true)

Meaning that when your lambda/function returns either a BigDecimal or a Decimal, the return type of the UDF will be DecimalType.SYSTEM_DEFAULT. DecimalType.SYSTEM_DEFAULT type is a Decimal with a precision of 38 and a scale of 18 :

val MAX_PRECISION = 38
...
val SYSTEM_DEFAULT: DecimalType = DecimalType(MAX_PRECISION, 18)

Conclusion

Thus, every time you transform a lambda or a function that returns a Decimal or a BigDecimal to a Spark's UDF, the precision and scale are erased with the default precision of 38 and scale of 18.

So your only way is to follow previous answer and cast returned value of UDF when calling it

Vincent Doba
  • 4,343
  • 3
  • 22
  • 42
1

Spark associates Decimal with decimal(38, 18). You need an explicit cast

$"price".cast(DataTypes.createDecimalType(32,2))
bottaio
  • 4,963
  • 3
  • 19
  • 43