By convention, a text file cannot contain non-printable characters (including NUL). If a file contains such characters, it isn’t a text file — it’s a binary file.
R strictly1 adheres to this convention, and completely disallows NUL characters. You really need to read and treat the data as binary data. This means using readBin
and the raw
data type:
n = file.size(filename)
buffer = readBin(filename, 'raw', n = n)
# Unfortunately the above has a race condition, so check that the size hasn’t changed!
stopifnot(n == file.size(filename))
Now we can fix the buffer by removing embedded zero bytes. This assumes UTF-x or ASCII encoding! Other encodings might have embedded zero bytes that need to be interpreted!
buffer = buffer[buffer != 0L]
text = rawToChar(buffer)
1 Maybe too strictly …