I write a UDF and use it in sql in a spark paragraph like this:
import java.security.MessageDigest
spark.udf.register("covertMd5", (input: String) => {
if (input == null || input.size == 0) null
else {
if (input.startsWith("_")) input
else {
var md5: MessageDigest = null
try {
md5 = MessageDigest.getInstance("MD5")
}
catch {
case e: Exception => {
e.printStackTrace
println(e.getMessage)
}
}
val byteArray: Array[Byte] = input.getBytes
val md5Bytes: Array[Byte] = md5.digest(byteArray)
var hexValue: String = ""
var i: Integer = 0
for (i <- 0 to md5Bytes.length - 1) {
val str: Int = (md5Bytes(i).toInt) & 0xff
// println("str" + str)
if (str < 16) {
hexValue = hexValue + "0"
}
hexValue = hexValue + (Integer.toHexString(str))
}
hexValue.toString
}
}
})
spark.sql("""select covertMd5(col) from table""")
This paragraph can be run normally about 10+ times, then it report error:
java.lang.InternalError: Malformed class name
at java.lang.Class.getSimpleBinaryName(Class.java:1450)
at java.lang.Class.getSimpleName(Class.java:1309)
at org.apache.spark.sql.catalyst.expressions.ScalaUDF.udfErrorMessage$lzycompute(ScalaUDF.scala:1048)
at org.apache.spark.sql.catalyst.expressions.ScalaUDF.udfErrorMessage(ScalaUDF.scala:1047)
at org.apache.spark.sql.catalyst.expressions.ScalaUDF.doGenCode(ScalaUDF.scala:1000)
at
... 97 elided
Caused by: java.lang.StringIndexOutOfBoundsException: String index out of range: -83
at java.lang.String.substring(String.java:1931)
at java.lang.Class.getSimpleBinaryName(Class.java:1448)
... 164 more
If restart the zeppelin daemon, this problem can be fixed temporarily. But happened again after 10+ times execution.
Any one has idea? Thanks very much.