0

I noticed there are two ways of handling LPWSTR output types in the windows-rs crate; Both start with marshalling the string into a slice:

let len = (0..).take_while(|&i| *ptr.offset(i) != 0).count();
let slice = std::slice::from_raw_parts(ptr, len);

Then it seems we can do one of:

let output = OsString::from_wide(slice).to_string_lossy().into_owned()

or

let output = String::from_utf16_lossy(slice)

The first option I'm guessing is more correct, but in order to use it you have to bring std::os::windows::prelude::OsStringExt into scope. The problem for me right now is that prevents rust-analyzer from linting due to a bug: https://github.com/rust-analyzer/rust-analyzer/issues/6063

Using the later method sidesteps this to improve my development workflow. Is the later method likely cause me any problems?

Paul Dempsey
  • 639
  • 3
  • 19
Dragoon
  • 723
  • 6
  • 13
  • The latter doesn't cause you any more problems than the former. They are both lossy, in that they are dropping invalid code unit sequences. The question is: What are you trying to accomplish? – IInspectable Jun 29 '21 at 21:46
  • Just reading some attributes from active directory. Can't use the ldap3 crate due to no integrated authentication support, so using win32 instead. – Dragoon Jun 29 '21 at 22:01
  • 1
    Either way, you are dropping non-Unicode input sequences that get replaced with `U+FFFD`. I'm not familiar with LDAP's rules to know whether this is an issue, i.e. whether LDAP is allowed to produce non-Unicode sequences. Just to make sure that you understand what `OsString` really is: It stores strings using a relaxed version of UTF-8 ([WTF-8](https://en.wikipedia.org/wiki/UTF-8#WTF-8)), and doesn't use Windows' platform-native encoding (UTF-16). By the time you call `to_string_lossy` all benefits of using `OsString` are gone. – IInspectable Jun 30 '21 at 06:42
  • is there a proper way to get a normal rust string out of OsString if it has invalid utf16 characters? Basically at this point it is either being presented to the user or possibly used for other purposes, but either way it needs to end up as either a normal String or &str. – Dragoon Jun 30 '21 at 16:50
  • Use [into_string](https://doc.rust-lang.org/std/ffi/struct.OsString.html#method.into_string). If nothing else at least you aren't silently ignoring conversion errors. – IInspectable Jun 30 '21 at 17:26
  • Rust assumes strings are valid UTF-8. A Rust string containing invalid UTF-8 is expressly Undefined Behavior (UB): https://doc.rust-lang.org/reference/types/textual.html. As for what's "correct" really depends on what you're expecting to do with the string. if you're just showing it to users, then having U+FFFD where it isn't valid Unicode is acceptable. If you need to round-trip that back to Win APIs, that's not going to work as you've lost information (hence the name to_string_lossy). – Paul Dempsey Nov 26 '22 at 01:51

0 Answers0