14

Editor's note: This code example is from a version of Rust prior to 1.0 and is not valid Rust 1.0 code, but the answers still contain valuable information.

I want to pass a string literal to a Windows API. Many Windows functions use UTF-16 as the string encoding while Rust's native strings are UTF-8.

I know Rust has utf16_units() to produce a UTF-16 character iterator, but I don't know how to use that function to produce a UTF-16 string with zero as last character.

I'm producing the UTF-16 string like this, but I am sure there is a better method to produce it:

extern "system" {
    pub fn MessageBoxW(hWnd: int, lpText: *const u16, lpCaption: *const u16, uType: uint) -> int;
}

pub fn main() {
    let s1 = [
        'H' as u16, 'e' as u16, 'l' as u16, 'l' as u16, 'o' as u16, 0 as u16,
    ];
    unsafe {
        MessageBoxW(0, s1.as_ptr(), 0 as *const u16, 0);
    }
}
Shepmaster
  • 388,571
  • 95
  • 1,107
  • 1,366
Gigih Aji Ibrahim
  • 405
  • 1
  • 3
  • 10

3 Answers3

25

Rust 1.8+

str::encode_utf16 is the stable iterator of UTF-16 values.

You just need to use collect() on that iterator to construct Vec<u16> and then push(0) on that vector:

pub fn main() {
    let s = "Hello";

    let mut v: Vec<u16> = s.encode_utf16().collect();
    v.push(0);
}

Rust 1.0+

str::utf16_units() / str::encode_utf16 is unstable. The alternative is to either switch to nightly (a viable option if you're writing a program, not a library) or to use an external crate like encoding:

extern crate encoding;

use std::slice;

use encoding::all::UTF_16LE;
use encoding::{Encoding, EncoderTrap};

fn main() {
    let s = "Hello";

    let mut v: Vec<u8> = UTF_16LE.encode(s, EncoderTrap::Strict).unwrap();
    v.push(0); v.push(0);
    let s: &[u16] = unsafe { slice::from_raw_parts(v.as_ptr() as *const _, v.len()/2) };
    println!("{:?}", s);
}

(or you can use from_raw_parts_mut if you want a &mut [u16]).

However, in this particular example you have to be careful with endianness because UTF_16LE encoding gives you a vector of bytes representing u16's in little endian byte order, while the from_raw_parts trick allows you to "view" the vector of bytes as a slice of u16's in your platform's byte order, which may as well be big endian. Using a crate like byteorder may be helpful here if you want complete portability.

This discussion on Reddit may also be helpful.

Shepmaster
  • 388,571
  • 95
  • 1,107
  • 1,366
Vladimir Matveev
  • 120,085
  • 34
  • 287
  • 296
  • wow, that works! thanks. Previously i use let mut v = s.utf16_units().collect::(); but the code failed to compile. – Gigih Aji Ibrahim Aug 08 '14 at 07:44
  • 1
    @GigihAjiIbrahim, it failed to compile because `collect()`'s type argument should be target collection, not element type. `collect::>()` would have worked too. – Vladimir Matveev Aug 08 '14 at 07:59
5

Rust 1.46+

For static UTF-16 strings, the utf16_lit crate provides an easy to use macro to do this at compile time:

use utf16_lit::utf16_null;

fn main() {
    let s = &utf16_null!("Hello");
    println!("{:?}", s);
}
ChrisD
  • 3,378
  • 3
  • 35
  • 40
3

If you are using literals, you can use the w macro from windows-sys: https://docs.rs/windows-sys/latest/windows_sys/macro.w.html

use windows_sys::w;

MessageBoxW(0, w!("Hello"), 0 as *const u16, 0);
proski
  • 3,603
  • 27
  • 27
  • 1
    I think in current rust version, this is the best answer if we dealing with windows API. Never thought that my question from 9 year ago still useful – Gigih Aji Ibrahim Jun 20 '23 at 05:33