1

I have a small lua function to check if a file exists

function file_exists( filePath )
    local handler = io.open( filePath )
    if handler then
        io.close( handler )
        return true
    end
    return false
end

However, this will always return false when the file path contains special chars such as German umlauts (äöü). Is there any way around this?

Thanks a lot!

silent
  • 14,494
  • 4
  • 46
  • 86
  • You need to encode file names in Windows default codepage (usually win12xx) instead of UTF-8. – Egor Skriptunoff Apr 19 '16 at 14:30
  • sorry, what do you mean by that? I get my file paths from a different source which initially stores them – silent Apr 19 '16 at 14:38
  • I mean your paths (received from a different source) may be in wrong codepage. Check it with `yourpath:byte(1,-1)` to guess the encoding. – Egor Skriptunoff Apr 19 '16 at 15:50
  • yes, they are stored in UTF-8. But I guess there is no way in lua to convert into a different codepage, is there? – silent Apr 20 '16 at 10:44
  • Of course, there is a way. What is your Windows codepage? You can determine your codepage from `HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Nls\CodePage\ACP` or by running this program in LuaJIT: `f=require"ffi" f.cdef"int GetACP();" print(f.C.GetACP())` – Egor Skriptunoff Apr 20 '16 at 13:43
  • Accroding to the REG it's 1252. The Lua code you provided did not work "module "ffi" not found") – silent Apr 20 '16 at 14:37

2 Answers2

2
utf8_to_cp1252 = (
   function(cp1252_description)
      local unicode_to_1252 = {}
      for code, unicode in cp1252_description:gmatch'\n0x(%x%x)%s+0x(%x+)' do
         unicode_to_1252[tonumber(unicode, 16)] = tonumber(code, 16)
      end
      local undefined = ('?'):byte()
      return
         function (utf8str)
            local pos, result = 1, {}
            while pos <= #utf8str do
               local code, size = utf8str:byte(pos, pos), 1
               if code >= 0xC0 and code < 0xFE then
                  local mask = 64
                  code = code - 128
                  repeat
                     local next_byte = utf8str:byte(pos+size, pos+size) or 0
                     if next_byte >= 0x80 and next_byte < 0xC0 then
                        code, size = (code - mask - 2) * 64 + next_byte, size+1
                     else
                        code, size = utf8str:byte(pos, pos), 1
                     end
                     mask = mask * 32
                  until code < mask
               end
               pos = pos + size
               table.insert(result, 
                  string.char(unicode_to_1252[code] or undefined))
            end
            return table.concat(result)
         end
   end
)[[
download 
http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP1252.TXT
and insert the whole text here:

#
#    Name:     cp1252 to Unicode table
#    Unicode version: 2.0
#    Table version: 2.01
..................................
0xFD    0x00FD  #LATIN SMALL LETTER Y WITH ACUTE
0xFE    0x00FE  #LATIN SMALL LETTER THORN
0xFF    0x00FF  #LATIN SMALL LETTER Y WITH DIAERESIS
]]

Usage:

cp1252_filename = utf8_to_cp1252(your_utf8_filename)

Now you can use cp1252_filename to invoke io.open(), os.rename(), os.execute() and other functions from standard Lua library.

Egor Skriptunoff
  • 23,359
  • 2
  • 34
  • 64
1

Lua and its tiny standard library is platform neutral and is not aware of correct Windows functions to read full unicode names. You can use winapi module to get some Windows-specific functions for this task. Note that it requires short name generation to be enabled on target disk.

local handler = io.open( winapi.short_path(filePath) )
if handler then
    -- etc
end

It can also be easily installed through LuaRocks: luarocks install winapi.

Oleg V. Volkov
  • 21,719
  • 4
  • 44
  • 68
  • I should add that we are using lua as part of another application - which ships the lua component. Could the way to install the lua extension with luarocks work anyway? we have not yet luarocks installed. – silent Apr 19 '16 at 14:39
  • You can simply put all the files in any path that `require` would read from by hand. – Oleg V. Volkov Apr 19 '16 at 15:03