I have a CSV file with a few issues.
--Mismatched quotes
--Commas within those mismatched quotes.
Which is making reading the data in a nightmare.
I've read reading badly formed csv in R - mismatched quotes
Reading my file in using
rawData = read.csv(curFile, stringsAsFactors=FALSE, header=TRUE, quote="")
As suggested here: R Programming: "More Columns than Column Names"
I think its because of the unmatched quotes, but using read.csv(quote="") still gives me that error. Removing the quote="" allows me to read the file in (no more more columns than column names error) but it still reads incorrectly.
"@realdonaldtrump","870440000","870442502","Louis Tonelli","L00byLou26","364","292","",0,0,"Wed Mar 23 03:03:18 +0000 2016","RT @realDonaldTrump: Incompetent Hillary, despite the horrible attack in Brussels today, wants borders to be weak and open-and let the Musl&","7.1247e+17","712474777378820097","<a href=""http://twitter.com/download/iphone"" rel=""nofollow"">Twitter for iPhone</a>",NA,NA,NA,NA,NA,"7.1247e+17","712473816614772736","Wed Mar 23 02:59:29 +0000 2016","Donald J. Trump","realDonaldTrump","New York, NY","7259400","41","<a href=""http://twitter.com/download/android"" rel=""nofollow"">Twitter for Android</a>"
Phone</a>"
"@realdonaldtrump","4831200000","4831194209","Chris Mattingly","_chrismattingly","605","194","Missouri, USA",0,0,"Wed Mar 23 03:03:18 +0000 2016","@realDonaldTrump <- Favorite buffoonish reply: ""Be careful, or [insert stock threat]"". How's the ""libel"" suit going? https://twitter.com","7.1247e+17","712474777064181761","<a href=""http://twitter.com/#!/download/ipad"" rel=""nofollow"">Twitter for iPad</a>",NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA
"@realdonaldtrump","4799600000","4799556991","Leann Rehm Lawrence","rehm_leann","101","295","",0,0,"Tue Mar 22 21:24:15 +0000 2016","RT @TrumpDynastyUSA: KINDRED SPIRITSBrought to TEARS.
LOVE & HONOR the ""Apple of G-D's EYE!""
Deuteronomy 32:9-10
@ElianaBenador @realDona&","7.1239e+17","712389451679342593","<a href=""http://twitter.com/download/android"" rel=""nofollow"">Twitter for Android</a>",NA,NA,NA,NA,NA,"7.1238e+17","712384968836718593","Tue Mar 22 21:06:26 +0000 2016","Lionhearted1","TrumpDynastyUSA","United States","5153","5140","<a href=""http://twitter.com"" rel=""nofollow"">Twitter Web Client</a>"
For reasons beyond my control, this is how the data appears in the file.
What I should be "reading in" here is 3 rows/observations.
Each line beginning with "@realdonaldtrump" is a new observation.
If anything falls below it, it should be part of the observation above it, but there are embedded \n or \r. That shouldn't matter, though, as its comma delimited.
However that does cause problems when commas fall in between.
The mismatch quotes problem is easy to see with line "2"