I've had this issue before, but my previous solution doesn't fix it.
In my text-data, in Notepad++ when I show all characters, a character listed as [SUB] appears.
PREVIOUSLY, I deleted these by doing this...
## Read the file in as Binary
r = readBin( curFile, raw(), file.info(curFile)$size)
## Convert the pesky characters
if ((r[1]==as.raw(0x1a)))
{
## Find it
spot = which(r == as.raw(0x1a) )
r[r == as.raw(0x1a)] = as.raw(0x20)
}
However, this isn't working. It seems like every time I manage to escape an invisible character, within a week, another one causes me a problem. Is there a way to just "clean" a file effectively of all invisible control characters other than the new-lines separating my data entries?
Please let me know. This is maddening already.
Thanks!
I can make a limited CSV file for you all to try. It's the second line, 4th column that causes the crash.
http://www.megafileupload.com/6ead/stackOverflow.csv
The entire code I was using to do this is below....
library(stringr)
############# DO THIS FIRST
folder = "C:\\Twitter_TimeSeries\\Bernie_Practice\\"
## Get the file name of every file in the directory
file.names = dir(folder, pattern=".csv")
## Figure out how many files there are
numFiles = length(file.names)
## Loop through every file
for( i in 1:length(file.names))
{
## Which file are we on?
curFile = paste( folder, file.names[i], sep="" )
## Read the file in as Binary
r = readBin( curFile, raw(), file.info(curFile)$size)
## Convert the pesky characters
if ((r[1]==as.raw(0x1a)))
{
## Find it
spot = which(r == as.raw(0x1a) )
r[r == as.raw(0x1a)] = as.raw(0x20)
}
if ((r[1]==as.raw(0x0a))) {
## Find it
spot = which(r == as.raw(0x0a) )
r[r == as.raw(0x1a)] = as.raw(0x20)
} ## If
## Re-write the file
writeBin(r, curFile)
} ## For
curFile = stackOverflow.csv
rawData = read.csv(curFile, stringsAsFactors=FALSE)