So I have a csv file with 16 fields, and these two records in particular are unable to be correctly parsed
1,"X","X",,"Y ""Y"", Y, Y","Y,Y,Y,Y,Y,Y,Y,Y,Y",,,,,,"X",,,,"X"
2,"X","X",,"""Y"" Y, Y","Y,Y,Y,Y",,,,,,"X","X",,,"X"
Expected deliminators -
1|"X"|"X"||"Y ""Y"", Y, Y"|"Y,Y,Y,Y,Y,Y,Y,Y,Y"||||||"X"||||"X"
2|"X"|"X"|"""Y"" Y, Y"|"Y,Y,Y,Y"||||||"X"|"X"|||"X"
Now for example, "Y,Y,Y,Y,Y,Y,Y,Y,Y"
this field is being correctly parsed to a single column, but """Y"" Y, Y"
and "Y ""Y"", Y, Y"
are failing. Is there anyway to correct this when using spark to read from a csv? Some option? I can use?
Note - the incoming data can not be changed in anyway, so escaping double quotes in the landing data is not an option.