I'm trying to understand an inconsistency in the behaviour of R
's file()
function.
Example:
# openssl for producing hash values
require(openssl)
# sample data
data(mtcars)
saveRDS(mtcars, './mtcars.rds')
saveRDS(mtcars, './mtcars测试.rds')
# if file name is ascii, file() produces different outputs
# for raw = FALSE/TRUE
sha2(file('./mtcars.rds', raw = FALSE), size = 256L)
sha2(file('./mtcars.rds', raw = TRUE), size = 256L)
# if file name contains unicode characters,
# the output stays the same regardless of raw = FALSE/TRUE
sha2(file('./mtcars测试.rds', raw = FALSE), size = 256L)
sha2(file('./mtcars测试.rds', raw = TRUE), size = 256L)
# But text files are not affected
writeLines(text = 'openssl, mtcars, 天地玄黄', con = './mtcars.txt')
writeLines(text = 'openssl, mtcars, 天地玄黄', con = './mtcars测试.txt')
sha2(file('./mtcars.txt', raw = FALSE), size = 256L)
sha2(file('./mtcars.txt', raw = TRUE), size = 256L)
sha2(file('./mtcars测试.txt', raw = FALSE), size = 256L)
sha2(file('./mtcars测试.txt', raw = TRUE), size = 256L)
I tested the above on both Windows (R
3.3.3 x64) and CentOS (R
3.4.0 x64), with openssl
version 0.9.6.
My question is, why is this happening?
My best guess so far:
file()
uses special method if given azip
file. (And I knowRDS
is aR
object gzipped to the disk.)- Some low level function of
R
still don't like Unicode characters in file names. - A file name containing Unicode characters blinds the
zip
file detection and forcesfile()
to useraw = TRUE
.