I am looking for a way to read a parquet file, replace values with null in some columns where condition matches and write the data back to the original file.
Using Spark it's pretty easy, but I want to achieve this without it.
This is how I would do it in Spark:
val path = "path/to/my.parquet"
val nullableColumns = Seq("column1", "column2")
val df = spark.read.parquet(path)
val dfNulled = nullableColumns.foldLeft(df){
case (acc, column_name) => acc.withColumn(column_name, when($"id_column" === some_id, null).otherwise(col(column_name)))
}
dfNulled.write.mode("overwrite").parquet(path)
Any help, please, how to do it in Scala without Spark?