5

I have a dataset with 5 Million records, I need to replace all the values in column using startsWith() supplying multiple or and conditions.

This code works for a single condition:

df2.withColumn("Deposits", when(col("Deposits").startsWith("0"),"1.1").otherwise(col("Deposits"))).show()

I need to do this:

df2.withColumn("Deposits", when(col("Deposits").startsWith("0"||"1"),"1.1").otherwise(col("Deposits")))
blackbishop
  • 30,945
  • 11
  • 55
  • 76
SK Sayyad
  • 87
  • 9

2 Answers2

4

Use rlike with regex ^(1|0) (starts with 1 or 0):

df2.withColumn(
  "Deposits",
  when(col("Deposits").rlike("^(1|0)"), "1.1").otherwise(col("Deposits"))
).show

If you want to update Deposits column depending on the startwith condition, you can chain multiple when expressions like this:

val depositsCol = Seq(
  ("1", "1.1"),
  ("2", "1.2")
).foldLeft(col("Deposits")) { case (acc, (start, value)) =>
  when(col("Deposits").startsWith(start), value).otherwise(acc)
}

df2.withColumn("Deposits", depositsCol).show
blackbishop
  • 30,945
  • 11
  • 55
  • 76
3

You can separate the two conditions and combine them with || (or):

df2.withColumn(
    "Deposits",
    when(
        col("Deposits").startsWith("0") || col("Deposits").startsWith("1"),
        "1.1"
    ).otherwise(
        col("Deposits")
    )
)
mck
  • 40,932
  • 13
  • 35
  • 50