1

I'm trying to understand an inconsistency in the behaviour of R's file() function.

Example:

# openssl for producing hash values
require(openssl)

# sample data
data(mtcars)
saveRDS(mtcars, './mtcars.rds')
saveRDS(mtcars, './mtcars测试.rds')

# if file name is ascii, file() produces different outputs
# for raw = FALSE/TRUE
sha2(file('./mtcars.rds', raw = FALSE), size = 256L)
sha2(file('./mtcars.rds', raw = TRUE), size = 256L)

# if file name contains unicode characters,
# the output stays the same regardless of raw = FALSE/TRUE
sha2(file('./mtcars测试.rds', raw = FALSE), size = 256L)
sha2(file('./mtcars测试.rds', raw = TRUE), size = 256L)

# But text files are not affected
writeLines(text = 'openssl, mtcars, 天地玄黄', con = './mtcars.txt')
writeLines(text = 'openssl, mtcars, 天地玄黄', con = './mtcars测试.txt')

sha2(file('./mtcars.txt', raw = FALSE), size = 256L)
sha2(file('./mtcars.txt', raw = TRUE), size = 256L)

sha2(file('./mtcars测试.txt', raw = FALSE), size = 256L)
sha2(file('./mtcars测试.txt', raw = TRUE), size = 256L)

I tested the above on both Windows (R 3.3.3 x64) and CentOS (R 3.4.0 x64), with openssl version 0.9.6.

My question is, why is this happening?

My best guess so far:

  • file() uses special method if given a zip file. (And I know RDS is a R object gzipped to the disk.)
  • Some low level function of R still don't like Unicode characters in file names.
  • A file name containing Unicode characters blinds the zip file detection and forces file() to use raw = TRUE.
Konrad Rudolph
  • 530,221
  • 131
  • 937
  • 1,214
Yifeng Mu
  • 206
  • 2
  • 6

0 Answers0