I have a csv with around 15 columns
- I would like to skip first 2 lines and use a custom schema
- Remove double quotes from row values
csv is as below.
Header1 blah blah
Header2 blah blah
Name1;"1,456";"City1";"3";"pet"
Name2;"3,450";"City2";"4";"not pet"
delimiter = ";"
salesDF = spark.read.format("csv") \
.option("quote", "") \
.option("sep", delimiter) \
.load("sales_2018.csv")
salesDF = salesDF.replace("\"","")
I tried as above to remove quotes from csv. Delimiter works but quotes are not getting removed.
Results are as below: It has added only quotes but didn't remove.
Header1 blah blah
Header2 blah blah
"Name1;""1,456"";""City1"";""3"";""pet""
"Name2;""3,450"";""City2"";""4"";""not pet""
My idea is to remove quotes and the remove the first 2 lines of the dataframe to add my custom schema. Thanks.