1

I have a situation here in which I create temp files from Ruby in the user folder. I am using Dir.mktmpdir() to create a temporary folder I am putting my files into.

For some reason, that function gives me a windows short folder name, so in instead of e.g. C:\Users\very long username\AppData\Local\Temp\20181018-5548 I get something like C:\Users\VERYLO~1\AppData\Local\Temp\20181018-5548. This is the original root cause of my problem, but my impression is that I can't really fix this, because I have to work in an environment that I don't have a lot of control over (Ruby 2.0 and 2.2 embedded into SketchUp to be exact). This also limits the amount of external Ruby libraries I can sensibly bring in.

To get the actual long folder name, I have been calling the Win32 function GetLongPathName() through the WIN32API class and that has been largely successful.

However, I am running into trouble with special characters. It seems like the buffers I send to the Win32 API (and the return buffers as well) are expected to be in a certain encoding and the Ruby strings are UTF-8 (or assume the return values to be UTF-8). I'd gladly facilitate any change in encoding, but I am somewhat lost in terms of which encoding to go for. I am not even sure whether the wide character version of the Win32 API is being used.

There is something that adds to the weirdness and makes me wonder whether I may be barking up the wrong tree: Using an international and/or English version of Windows I can actually send and receive all kinds of special characters into the Win32 API call without problems. However, as soon as I use a different language edition of Windows (I tried Brazilian Portugese, but we had people with Hebrew and some eastern-european Windows editions report this problem as well), it stops working.

def self.get_long_win32_filename(short_name)
    require 'Win32API'
    max_path = 1024
    long_name = " " * max_path
    lfn_size = Win32API.new("kernel32", "GetLongPathName", ['P','P','L'],'L').call(short_name, long_name, max_path)
    return (1..max_path).include?(lfn_size) ? long_name[0..lfn_size-1] : short_name
end 

Here is the code I am using:

Any help in figuring out how to approach the encoding problem when passing strings into and out of the Win32 API very much appreciated!

  • I have no windows box on hand to check, but AFAIR windows uses UTF16 to store file names. Just convert to UTF16 and you should be all set. – Aleksei Matiushkin Oct 18 '18 at 10:43
  • Hi @AlekseiMatiushkin, thanks for the comment! That is (unfortunately) not the whole story. Windows does use UTF-16, when the wide character API is being used, but I don't even know whether this is the case. I actually suspect it isn't the case, because otherwise where would the dependency on the windows language edition come from? Furthermore, I tried converting to UTF-16 by adding the line `short_name = short_name.encode(Encoding::UTF_16)` and that did not help. – Timm Dapper Oct 18 '18 at 12:52

3 Answers3

0

To work with these languages, you need to use GetLongPathNameW. Only problem is this function uses wide strings, so you'll need to call the WinAPI functions that convert multi-byte strings to wide strings and vice-versa.

The windows-pr gem has these functions already neatly defined as multi_to_wide and wide_to_multi. I don't know if you can use gems in your environment, if you can't, then try these "simplified" versions:

def mb_to_wide(str)
  # CP_UTF8 = 65001
  wsize = Win32API.new("kernel32", "MultiByteToWideChar", 'ILSIPI', 'I').call(65001, 0, str, -1, nil, 0)
  if wsize > 0
    wstr = " " * wsize * 2
    Win32API.new("kernel32", "MultiByteToWideChar", 'ILSIPI', 'I').call(65001, 0, str, -1, wstr, wsize)
    wstr
  end
end

def wide_to_mb(wstr)
  wstr << "\000\000" if wstr[-1].chr != "\000" # add wide null terminators if not found
  size = Win32API.new("kernel32", "WideCharToMultiByte", 'ILSIPIPP', 'I').call(65001, 0, wstr, -1, 0, 0, nil, nil)
  if size > 0
    str = " " * wstr.length
    Win32API.new("kernel32", "WideCharToMultiByte", 'ILSIPIPP', 'I').call(65001, 0, wstr, -1, str, wstr.length, nil, nil)
    str[/^[^\0]*/] # up to \0
  end
end

Then you'll just need to adapt your function to something like this:

def self.get_long_win32_filename(short_name)
  max_path = 1024
  long_name = " " * max_path
  wshort_name = mb_to_wide(short_name)
  size = Win32API.new("kernel32", "GetLongPathNameW", 'PPL', 'L').call(wshort_name, long_name, max_path)
  wide_to_mb(long_name[/.+?(?=\0\0)/]) # up to \0\0
end

This may solve the problem stated in the question title and in most of your post, but maybe you're asking the wrong question, and you should find a simpler solution for creating a temporary folder without it returning a short path to start with. So you wouldn't need to go through all of this just to solve your problem.

Toribio
  • 3,963
  • 3
  • 34
  • 48
  • Thanks for the answer, Flávio! I didn't think of the option to explicitly calling the wide character version by appending `W` to the function name. However, I am not sure how this would solve the original problem. I need to pass an UTF-8 encoded Ruby String into a Win32 function that expects multi-byte. With the example above, I wouldn't have that problem any more with the actual call to `GetLongPathName()`, but I would just move the problem to my call into `MultiByteToWideChar()`. However, maybe the windows-pr guys solved my problem and the secret is hidden somewhere in their use of Win32API. – Timm Dapper Oct 22 '18 at 08:53
  • As a followup: After investigating `MultiByteToWideChar()` a little more closely, I realize that I was wrong. That function takes a code page identifier as input, so it can (other than the other Win32 functions) work with any encoding, including UTF-8. So what you are suggesting should indeed work. It still seems like there should be a more elegant solution, but thank you so much for the pointers! – Timm Dapper Oct 22 '18 at 09:22
0

Instead of running hoops around with Win32 for this I'd recommend using Sketchup.temp_dir (http://ruby.sketchup.com/Sketchup.html#temp_dir-class_method) to get the system temp path and generate your own unique name for you own temp subdirectory. It'll be a lot less fragile that converting between locales and Win32 calls.

thomthom
  • 2,854
  • 1
  • 23
  • 53
  • Unfortunately that does not solve our problem. `Sketchup.temp_dir` appears to use the same function that Ruby's `Dir.mktmpdir` is using internally. On all machines I tried it on, both functions just return the short folder name. – Timm Dapper Oct 22 '18 at 09:10
  • Interesting. I've not observed that. Can you log an issue at our issue tracker please? https://github.com/SketchUp/api-issue-tracker/issues – thomthom Oct 22 '18 at 10:54
0

Not sure if this is the best way, but here is what seemed to work for me. Basically I just modified the function from the question to convert the input to the current system locale and convert the result back:

def self.get_long_win32_filename(short_name)
    max_path = 1024
    long_name = " " * max_path
    # Make sure the short_name is in the current system locale encoding,
    # because the Win32 API appears to always expect strings like that.
    short_name = short_name.encode(Encoding::find('locale'))
    lfn_size = Win32API.new("kernel32", "GetLongPathName", ['P','P','L'],'L').call(short_name, long_name, max_path)
    # If lfn_size is a valid value, shorten the long_name to the actual length,
    # otherwise use short_name (e.g. when a zero lfn_size indicates an error).
    long_name = (1..max_path).include?(lfn_size) ? long_name[0..lfn_size-1] : short_name
    # Make sure the return string is in the correct encoding again.
    long_name.force_encoding(Encoding::find('locale'))
    return long_name.encode(Encoding::UTF_8)  
end 

One thing I am not sure about is whether I should use Encoding::find('locale') like I'm doing or Encoding::find('filesystem') because my strings are file names.