I have a .csv file that contains a situation like this (additional spaces added for readability):
1, 3 , "string" , "string4" , NA
2, 5 , "string" , "s\"tring\"4" , 3
1, 3 , "string" , "stri,ng4" , 5
8, 7 , "string" , "st\"ri,n\"g4" , 5
I am reading this into RStudio on a Windows 10 machine, using the following statement:
read.table("file_name.csv",fill=TRUE, header=FALSE, quote="\"", sep=",", encoding="UTF-8")
With the following response:
V1 V2 V3 V4 V5 V6
1 1 3 string string4 <NA> NA
2 2 5 string s\\tring\\4 3 NA
3 1 3 string stri,ng4 5 NA
4 8 7 string st\\ri n\\g4 5
The problem seems to be that the comma within the escaped quotes in row 5, it is being interpreted as a separator.
I am expecting/looking for something like following, but I'm not sure how to get it.
V1 V2 V3 V4 V5
1 1 3 string string4 <NA>
2 2 5 string s\"tring\"4 3
3 1 3 string stri,ng4 5
4 8 7 string st\"ri,n\"g4 5
I considering reprocessing the file using grep to change \" to ', but I'm curious if there is a more direct method. It seems like a potentially common issue, but I can't find a good example a solution.
Thoughts, anyone?