Using Unicode strings in DllImport with a DLL written in Rust

Question

I am trying to call a DLL written in Rust from a C# program. The DLL has two simple functions that take stings (in different manner) and prints to the console.

Rust DLL code

#![crate_type = "lib"]
extern crate libc;

use libc::{c_char};
use std::ffi::CStr;

#[no_mangle]
pub extern fn printc(s: *const c_char){
    let c_str : &CStr = unsafe {
        assert!(!s.is_null());

        CStr::from_ptr(s)
    };

    println!("{:?}", c_str.to_bytes().len()); //prints "1" if unicode

    let r_str = std::str::from_utf8(c_str.to_bytes()).unwrap();
    println!("{:?}", r_str);
}

#[no_mangle]
pub extern fn print2(string: String) {
    println!("{:?}", string)
}

C# console program code

[DllImport("lib.dll", CharSet = CharSet.Unicode, CallingConvention = CallingConvention.Cdecl)]
static extern void print2(ref string str);

[DllImport("lib.dll", CallingConvention = CallingConvention.Cdecl)]
static extern void printc(string str);

static void Main(string[] args)
{
  try
  {
    var graw = "yeyeye";
    printc(graw);
    print2(ref graw);
  }
  catch (Exception ex)
  {
    Console.WriteLine("calamity!, {0}", ex.Message);
  }
  Console.ReadLine();
}

For the print2 function it keep printing garbage on screen until it causes AccessViolationException

The 2nd printc function does print the string, but only if CharSet.Unicode is not set. If it is set, it will only print the first char, hence the println!("{:?}", c_str.to_bytes().len()); will print 1.

I believe that Cstr::from_ptr function does not support Unicode, that is why it returns only the first char of the string.

Any idea how to pass Unicode string as parameters to Rust DLLs? Is it possible to make things simpler like in print2 function?

I am not familiar with C#, but does it use C-style strings? I would be surprised. I would imagine that this is the issue, you're trying to pass a C# string as a C string. — Steve Klabnik, Sep 27 '15 at 23:29
(and print2 can't work, because C doesn't know what Rust strings are and how to make them) — Steve Klabnik, Sep 27 '15 at 23:29
Looks like you'll need to use UTF-8. That's `Encoding.UTF8.GetBytes(str + "\0")` and marshaled as `[In] byte[]`. — David Heffernan, Sep 28 '15 at 05:23

DK. · Accepted Answer · 2015-09-30T03:44:27.830

7

If you check the documentation on CharSet, you'll see that CharSet.Unicode tells .NET to marshal strings as UTF-16 (i.e. two bytes per code point). Thus, .NET is trying to pass printc what should be a *const u16, not a *const libc::c_char. When CStr goes to compute the length of the string, what it sees is the following:

b"y\0e\0y\0e\0y\0e\0"

That is, it sees one code unit, then a null byte, so it stops; hence why it says the length is "1".

Rust has no standard support for UTF-16 strings, but if you're working on Windows, there are some conversion methods: search the docs for OsStrExt and OsStringExt. Note that you must use the docs that installed with the compiler; the ones online won't include it.

Sadly, there's nothing for dealing directly with null-terminated UTF-16 strings. You'll need to write some unsafe code to turn a *const u16 into a &[u16] that you can pass to OsStringExt::from_wide.

Now, Rust does use Unicode, but it uses UTF-8. Sadly, there is no direct way to get .NET to marshal a string as UTF-8. Using any other encoding would appear to lose information, so you either have to explicitly deal with UTF-16 on the Rust side, or explicitly deal with UTF-8 on the C# side.

It's much simpler to re-encode the string as UTF-8 in C#. You can exploit the fact that .NET will marshal an array as a raw pointer to the first element (just like C) and pass a null-terminated UTF-8 string.

First, a static method for taking a .NET string and producing a UTF-8 string stored in a byte array:

byte[] NullTerminatedUTF8bytes(string str)
{
    return Encoding.GetBytes(str + "\0");
}

Then declare the signature of the Rust function like this:

[DllImport(dllname, CallingConvention = CallingConvention.Cdecl)]
static extern void printc([In] byte[] str);

Finally, call it like this:

printc(NullTerminatedUTF8bytes(str));

For bonus points, you can rework printc to instead take a *const u8 and a u32, passing the re-encoded string plus it's length; then you don't need the null terminator and can reconstruct the string using the std::slice::from_raw_parts function (but that's starting to go beyond the original question).

As for print2, that one is just unworkable. .NET knows nothing about Rust's String type, and it is in no way compatible with .NET strings. More than that, String doesn't even have a guaranteed layout, so binding to it safely is more or less not possible.

All that is a very long-winded way of saying: don't use String, or any other non-FFI-safe type, in cross-language functions, ever. If your intention here was to pass an "owned" string into Rust... I don't know if it's even possible to do in concert with .NET.

Aside: "FFI-safe" in Rust essentially boils down to: is either a built-in fixed-size type (i.e. not usize/isize), or is a user-defined type with #[repr(C)] attached to it. Sadly, the "FFI-safe"-ness of a type isn't included in the documentation.

edited Sep 30 '15 at 03:44

answered Sep 28 '15 at 05:11

DK.

55,277
5
189
162

1

It's easy to get .net to marshal as UTF-8. MS has never pretended UTF-8 does not exist. `Encoding.UTF8.GetBytes` is what is needed here. Marshal as `byte[]`. Manual zero termination needed. Are you aware that when MS added Unicode to support to Windows, that UTF-8 actually did not exist? – David Heffernan Sep 28 '15 at 05:20
1

@DavidHeffernan I dunno; it keeps treating CP 65001 like it doesn't really exist and no one should use it, not to mention PowerShell's pig-headed stubbornness in encoding all text it outputs as UTF-16 despite just about nothing being able to read it. I didn't consider re-encoding the string on the C# side because I was sticking to what you can do *just* with the dllimport annotation and *also* because I don't have a C# setup to actually test anything. As for that last point; I know UTF-8 didn't exist, but it doesn't make it any less of a frustration *now*. Aah, hindsight. :P – DK. Sep 28 '15 at 05:27
1

@DavidHeffernan I can't provide any (tested) help on the C# side; I've made the answer a community wiki if you want to make some changes on that end. – DK. Sep 28 '15 at 05:29
1

OK, let's see if we can wiki this between us. I know C#, you know Rust. – David Heffernan Sep 28 '15 at 05:36
1

The edit I made is compatible with `void __cdecl print(char* str)` where `str` is null terminated and encoded as UTF-8. – David Heffernan Sep 28 '15 at 05:44
1

@DavidHeffernan Just occurred to me: this still has the issue of truncating any string that contains interior nulls. Presumably, the signature could be changed to `[In] byte[], uint` / `*const u8, u32` to fix that. That would *also* simplify the code on the Rust side a little. Is there any easy way to abstract that? I'm assuming the best you could do is do the UTF-8 transcode, store it, then pass that plus `str.Length`. – DK. Sep 28 '15 at 05:50
1

That's a little harder to abstract because there are two arguments. But yes, you'd do it the way you describe. Generally this is a non issue. It's often no problem to ban interior nulls. – David Heffernan Sep 28 '15 at 05:55
1

@DavidHeffernan Aye, I've decided against further additions; it was starting to drift from the original question. Thanks for your help. :) – DK. Sep 28 '15 at 06:01
1

Thank you guys for the answer, and sorry for troubles, I will try it once I am home. I am just wondering what's the difference between `[in]` and `in` in c#. – IdontCareAboutReputationPoints Sep 28 '15 at 17:07
1

Well, the latter is not part of C#. Perhaps you think of `out`. As for `[In]`, that could be omitted. But I'm using it here almost as documentation. It tells the marshaller to marshal the data to the caller, but not back from the caller. But in reality arrays of blittable types are pinned for marshalling so they are always `[In,Out]`. – David Heffernan Sep 28 '15 at 18:16
1

@DavidHeffernan Thank you for your support again, I never thought that in C# you can apply attributes to function parameters, I mistook the `[In]` with `out` modifier. a lot I learned from you guys today and still a lot I should learn about data marshalling. Again one day lived new things learned. – IdontCareAboutReputationPoints Sep 28 '15 at 21:01
1

Just an addition, `Encoding.UTF8.GetBytes(str)` could also do the work without trailing `\0` – IdontCareAboutReputationPoints Sep 29 '15 at 07:11
1

@MusuNaji No. It might just happen to work by accident but you've got a buffer overrun there. Put the null terminator back. – David Heffernan Sep 29 '15 at 18:41
1

I'm disappointed that the answer says this: "Sadly, there is no direct way to get .NET to marshal a string as UTF-8. Using any other encoding would appear to lose information, so you're struck dealing with UTF-16 in Rust, I'm afraid." I really wish the Rust experts could tidy up that part of the answer. I did my bit!! – David Heffernan Sep 29 '15 at 18:43
1

Thanks. That's a lot better! – David Heffernan Sep 30 '15 at 05:21

Using Unicode strings in DllImport with a DLL written in Rust

Rust DLL code

C# console program code

1 Answers1

Linked