0

I am trying to cast an array as Decimal(30,0) for use in a select dynamically as:

WHERE array_contains(myArrayUDF(), someTable.someColumn)

However when casting with:

val arrIds = someData.select("id").withColumn("id", col("id")
                .cast(DecimalType(30, 0))).collect().map(_.getDecimal(0))

Databricks accepts that and signature however already looks wrong to be: intArrSurrIds: Array[java.math.BigDecimal] = Array(2181890000000,...) // ie, a BigDecimal

Which results in the below error:

Error in SQL statement: AnalysisException: cannot resolve.. due to data type mismatch: Input to function array_contains should have been array followed by a value with same element type, but it's [array<decimal(38,18)>, decimal(30,0)]

How do you correctly cast as decimal(30,0) in Spark Databricks Scala notebook instead of decimal(38,18) ?

Any help appreciated!

Leigh Mathieson
  • 1,658
  • 2
  • 17
  • 25

1 Answers1

1

You can make arrIds an Array[Decimal] using the code below:

import org.apache.spark.sql.functions.col
import org.apache.spark.sql.types.{Decimal, DecimalType}

val arrIds = someData.select("id")
  .withColumn("id", col("id").cast(DecimalType(30, 0)))
  .collect()
  .map(row => Decimal(row.getDecimal(0), 30, 0))

However, it will not solve your problem because you lose the precision and scale once you create your user defined function, as I explain in this answer

To solve your problem, you need to cast the column someTable.someColumn to Decimal with the same precision and scale than the UDF returned type. So your WHERE clause should be:

WHERE array_contains(myArray(), cast(someTable.someColumn as Decimal(38, 18)))
Vincent Doba
  • 4,343
  • 3
  • 22
  • 42
  • Thanks Vincent! Would it be possible as per previous link you mentioned to recast the UDF explicitly to Decimal(30,0) ? (Rather than the other way around..?) so Eg: val arrayDecIdsUdf = () => intArrSurrIds.cast(DataTypes.createDecimalType(30,0)) // recast here? spark.udf.register("myUDF", arrayDecIdsUdf) – Leigh Mathieson Nov 05 '20 at 08:29
  • 1
    No, because when you create the UDF you erase the Decimal precision and scale. The erasure takes place in `spark.udf.register` method, so every precision and scale changes you do before is useless – Vincent Doba Nov 05 '20 at 11:05