1

I have bunch of XML files named in Japanese. I use Lua to read them and put the necessary informations into tables. I could open files named only in a single kanji like 名.xml, but for multiple kanjis like 名前.xml it was contrawise. Before I ran the Lua file, I set the command line's code page to 65001 (as UTF-8). And to read the files I need to encode the filename using WinAPI library from ACP (ASCII code page?) to UTF-8, but this encoding only works for the single kanjis. I've tried several suggestions across internet, using short path to the file, etc. but none of them worked. I tried to use the short path by running Lua as administrator--as stated in other similar question that you need administrator previleges to use the short path--but no luck.

...
for fn in io.popen("DIR xml /B /AA"):lines() do
    ...
    local f = assert(io.open("xml\\" .. winapi.encode(winapi.CP_UTF8, winapi.CP_ACP, fn), "rb"))
    ...
end
...

But my code produced "Invalid argument" error. I searched this error but none of them are Lua-related, so I opened the C/C++-related ones, but what I got was only 'use _wfopen' or something like that. It's not implemented in Lua and neither I want to implement it myself. So anyone have any idea how to solve this? For more information just be sure to let me know. Thanks!

Dousea
  • 15
  • 6
  • What does `winapi.encode()` return? Please show the output of `print(fn:byte(1,-1)); print(winapi.encode(winapi.CP_UTF8, winapi.CP_ACP, fn):byte(1,-1))` for some short filename (e.g, "名前.xml") – Egor Skriptunoff Mar 09 '17 at 11:29
  • And what is your ACP (ansi code page)? You can view it in Windows registry `HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Nls\CodePage\ACP` – Egor Skriptunoff Mar 09 '17 at 11:33
  • @EgorSkriptunoff From UTF-8: `229 144 141 229 137 141 46 120 109 108` to ACP: `150 188 145 79 46 120 109 108` and my ACP is 932. – Dousea Mar 09 '17 at 15:33

2 Answers2

2

I don't know why your program does not work, but try this workaround:

local pipe = io.popen([[for %G in (xml\*) do @(type "%G" & echo @FILENAMEMARKER#%G)]], "rb")
local all_files = pipe:read"*a"
pipe:close()
for filecontent, filename in all_files:gmatch"(.-)@FILENAMEMARKER#(.-)\r?\n" do
   -- process your file here
   print('===== This is your file name:')
   print(filename)
   print('== This is your file content:')
   print(filecontent)
   print('== End of file')
end
Egor Skriptunoff
  • 906
  • 1
  • 8
  • 23
0

I think you can use the Japanese alphabet in a table like

local jaAlphbet={"一","|","丶","ノ","乙","亅","<","二","亠","人","⺅","","儿","入","ハ","丷","冂","冖","冫","几","凵","刀","⺉","力","勹","匕","匚","十","卜","卩","厂","厶","又","マ","九","ユ","乃","","⻌","口","囗","土","士","夂","夕","大","女","子","宀","寸","小","⺌","尢","尸","屮","山","川","巛","工","已","巾","干","幺","广","廴,"廾","弋","弓","ヨ","彑","彡","彳","⺖","⺘","⺡","⺨","⺾","⻏","⻖","也","亡","及","久","⺹","心","戈","戸","手","支","攵","文","斗","斤","方","无","日","曰","月","木","欠","止","歹","殳","比","毛","氏","气","水","火","⺣","爪","父","爻","爿","片","牛","犬","⺭","王","元","井","勿","尤","五","屯","巴","毋","玄","瓦","甘","生","用","田","疋","疒","癶","白","皮","皿","目","矛","矢","石","示","禸","禾","穴","立","⻂","世","巨","冊","母","⺲","牙","瓜","竹","米","糸","缶","羊","羽","而","耒","耳","聿","肉","自","至","臼","舌","舟","艮","色","虍","虫","血","行","衣","西","臣","見","角","言","谷","豆","豕","豸","貝","赤","走","足","身","車","辛","辰","酉","釆","里","舛","麦","金","長","門","隶","隹","雨","青","非","奄","岡","免","斉","面","革","韭","音","頁","風","飛","食","首","香","品","馬","骨","高","髟","鬥","鬯","鬲","鬼","竜","韋","魚","鳥","鹵","鹿","麻","亀","啇","黄","黒","黍","黹","無","歯","黽","鼎","鼓","鼠","鼻","齊","龠"}
print(jaAlphbet[1])--and you can call the letters, letter by letter

sorry but thats all i know about the subject you are talking about but i hope this helps

joshua chris
  • 55
  • 12