0

I have some code in an FFI module which converts Rust strings to CStrings to be sent back to the caller.

I'm finding that the calling conventions for CString::new() to be difficult to use. Specifically:

This function will return an error if the supplied bytes contain an internal 0 byte. The NulError returned will contain the bytes as well as the position of the nul byte.

Basically if you don't want to use unwrap() and want to call this function safely, you'd have to do something like:

let cstr = match CString::new(message) {
    Ok(cstr) => { cstr }
    Err(_) => { CString::new("error converting string!").unwrap() }
};

Theoretically it could be better, because the Err value contains the index of the null byte, I could truncate the string at that point, but this gets to be a bit involved.

Why? Because the first call to CString::new(message) consumes the string, so I'd have to clone the string before making the call just so I could use it again inside the Err arm of the match, which is a code path that should never get called anyway.

I've considered scanning for nulls beforehand, but my knowledge of of Unicode is limited and I'm reluctant to just remove null bytes from a Unicode string.

Basically my question is, is there a better way? I wish that CString::new(message) would just truncate the string at the first null then. It could give you back the truncated CString in the Err value.

Maybe there is another call which is easier to use. Am I missing something?

(Edit: I am only passing ASCII strings to this call, by the way.)

Edit: I went with this solution based on the accepted answer by @kmdreko:

pub unsafe fn log_to_c_callback(level: LogLevel, mut message: String)  {
    if let Some(callback) = LOGGER_CALLBACK {
        message.retain(|c| c != '\0');
        let len = message.len();

        #[allow(clippy::unwrap_used)]
        // Unwrap is safe because we removed all the null bytes above.
        let message_cstr = CString::new(message).unwrap();
        
        callback(level, message_cstr.as_ptr(), len as i32);
    }
}
pnadeau
  • 427
  • 5
  • 8
  • Not every Rust string can be a CString, so that probably necessitates the `Result` response. You can always `.ok()` to turn it into an `Option` which avoids the panic crash. – tadman Oct 02 '22 at 21:47
  • ASCII strings can contain NUL, it's literally part of the ASCII standard, so... – tadman Oct 02 '22 at 21:48
  • Yeah I get that not all input strings can safely be made into CStrings, it's just the ergonomics I have an issue with here. @tadman, can you give an example of how `.ok()` would be used? I've never used it before. – pnadeau Oct 02 '22 at 21:51
  • I think the real solution here is to think in terms of the context in which this operation is performed. What does the wrapper function look like? Does it return a `Result`? Could you fold this kind of error into the possible outcomes of that function? Remember, by default Rust tries to force you to handle errors that might occur, and if you're absolutely certain that will not happen, you need to assume responsibility, like via `.unwrap()` or by `.expect()` if that suits your style better. – tadman Oct 02 '22 at 21:51
  • Personally, on the off chance that a NUL byte does slip into this string, and the behaviour you want is "just deal", then I'd write an error handler that snips out the offending character(s) and tries again. That way there's no remaining failure conditions to handle. – tadman Oct 02 '22 at 21:54

1 Answers1

2

I've considered scanning for nulls beforehand, but my knowledge of of Unicode is limited and I'm reluctant to just remove null bytes from a Unicode string.

My knowledge of Unicode is less limited. You can safely search-for and remove null characters. Multi-byte characters will not have an all-zero byte(Wikipedia on UTF-8 encoding) and even if they did, Rust chars are Unicode scalar values not simple bytes.

let mut message = String::from("hello w\0rld");
message.retain(|c| c != '\0');
let cstr = CString::new(message).expect("should not have null characters");

You might take this opportunity to filter out other unsavory characters like control characters, newlines, whatever you fancy.


If you really don't want an .unwrap()/.expect(), you can use your original plan but without the cloning. The NulError type also returns the original "string" via .into_vec():

let message = String::from("hello w\0rld");

let mut message = message.into_bytes();
let cstr = loop {
    match CString::new(message) {
        Ok(cstr) => break cstr,
        Err(err) => {
            let idx = err.nul_position();
            message = err.into_vec();
            message.remove(idx);
        }
    }
};
kmdreko
  • 42,554
  • 6
  • 57
  • 106