1

I have a csv where a column sometimes contains a new line character (\n or \r), I need to parse this file into a dataframe ignoring or removing those characters BUT these values are NOT surrounded by quotes other wise I could simply add .option("multiline",true)

similar question with values surrounded by quotes: Escape New line character in Spark CSV read

Sample Code:

val df = spark.read
.option("wholeFile", true)
.option("multiline",true)
.option("header", true)
.option("inferSchema", "true")
.option("dateFormat", "yyyy-MM-dd")
.option("timestampFormat", "yyyy-MM-dd HH:mm:ss")
.csv("test.csv")

sample input:

id,commment,name
1,good,bob
2,bad
,tim
3,fine,sarah

sample output:

id comment name
1 good bob
2 bad null
tim null null
3 fine sarah

desired output:

id comment name
1 good bob
2 bad tim
3 fine sarah

edit table formatting

0 Answers0