1

I am looking for a way to read a parquet file, replace values with null in some columns where condition matches and write the data back to the original file.

Using Spark it's pretty easy, but I want to achieve this without it.

This is how I would do it in Spark:

val path = "path/to/my.parquet"
val nullableColumns = Seq("column1", "column2")
val df = spark.read.parquet(path)

val dfNulled = nullableColumns.foldLeft(df){ 
        case (acc, column_name) => acc.withColumn(column_name, when($"id_column" === some_id, null).otherwise(col(column_name)))
    }

dfNulled.write.mode("overwrite").parquet(path)

Any help, please, how to do it in Scala without Spark?

OneCricketeer
  • 179,855
  • 19
  • 132
  • 245

0 Answers0