1

I have this column in my database, called id which contains INTS.

e.g:

{id: 123456}  
{id: 234567}  
{id: 345678}  
{id: 456789}  
{id: 567890} 

and I need to update these values with it's encrypted values by calling a function encryptId(id). encryptId takes in a LONG and then returns a STRING value

My thought process is to use .withcolumn to replace the current id column with the encrypted value

db.withColumn("id", encryptId(col("id"))) gives me the error

type mismatch. Required: Long, found: column

db.withColumn("id", encryptId("id")) gives me the error

type mismatch. Required: Long, found: string

Am I doing this incorrectly? :(

Gabio
  • 9,126
  • 3
  • 12
  • 32
  • Not sure if it's related, but on the second command you are missing a double quote at the end of "id": `db.withColumn("id", encryptId("id"))` instead of `db.withColumn("id, encryptId("id"))` – Jaime Caffarel Jul 09 '22 at 16:22

1 Answers1

0

It seems that you didn't register encryptId as spark UDF. Let's assume that encryptId is defined as following:

val encryptId = (id: Long) => {
    // dummy implementation for simplicity   
    id.toString
}

You can register encryptId as UDF:

import org.apache.spark.sql.functions.udf

val encryptIdUdf = udf(encryptId)

Now you can use encryptIdUdf as following:

db.withColumn("id", encryptIdUdf(col("id")))
Gabio
  • 9,126
  • 3
  • 12
  • 32