Rust compiler generating intrinsic llvm.add call instruction while clang generates normal add?

Question

While working with llvm ir I noticed that when compiling a simple addition in c, clang will generate a normal llvm add instruction. However when I compile the same code written in rust, rustc generates a call to

%38 = call { i32, i1 } @llvm.ssub.with.overflow.i32(i32 %37, i32 5), !dbg !597

%39 = extractvalue { i32, i1 } %38, 0, !dbg !597 %40 = extractvalue { i32, i1 } %38, 1, !dbg !597

%41 = call i1 @llvm.expect.i1(i1 %40, i1 false), !dbg !597 br i1 %41, label %panic1, label %bb9, !dbg !597

followed by two extractvalue instructions and some according error handling if an overflow has occurred. why does it do that? As far as I understand, there is overflow handling with the normal add instruction as well through the nsw keyword:

If the nuw and/or nsw keywords are present, the result value of the add is a poison value if unsigned and/or signed overflow, respectively, occurs.

as I understand, when the IR is further lowered to assembly, it will result in the same code?

*"If the nuw and/or nsw keywords are present, the result value of the add is a poison value"* - that's not what Rust does on integer overflow, it will trigger a panic (in debug mode) or wrap (in release mode). Related: https://stackoverflow.com/questions/60238060/is-signed-integer-overflow-in-safe-rust-in-release-mode-considered-as-undefined — kmdreko, Jul 22 '22 at 17:02

silvergasp · Answer 1 · 2022-07-22T19:36:02.050

TL;DR:

as I understand, when the IR is further lowered to assembly, it will result in the same code?

No, it will not. rustc (in debug mode) ~= clang + undefined behaviour sanitiser UBSAN.

Explanation

In debug mode rustc generates code to capture and panic on integer overflows. e.g.

pub fn bad_add(num: i32) -> i32 {
    num + i32::MAX
}

Results in;

define i32 @_ZN7example7bad_add17ha9c5f96e25ec3c52E(i32 %num) unnamed_addr #0 !dbg !5 {
start:
  %0 = call { i32, i1 } @llvm.sadd.with.overflow.i32(i32 %num, i32 2147483647), !dbg !10
  %_3.0 = extractvalue { i32, i1 } %0, 0, !dbg !10
  %_3.1 = extractvalue { i32, i1 } %0, 1, !dbg !10
  %1 = call i1 @llvm.expect.i1(i1 %_3.1, i1 false), !dbg !10
  br i1 %1, label %panic, label %bb1, !dbg !10

bb1:                                              ; preds = %start
  ret i32 %_3.0, !dbg !11

panic:                                            ; preds = %start
  call void @_ZN4core9panicking5panic17hab046c3856b52f65E([0 x i8]* align 1 bitcast ([28 x i8]* @str.0 to [0 x i8]*), i64 28, %"core::panic::location::Location"* align 8 bitcast (<{ i8*, [16 x i8] }>* @alloc7 to %"core::panic::location::Location"*)) #4, !dbg !10
  unreachable, !dbg !10
}

However in release mode e.g. adding -C opt-level=3 we get

define i32 @_ZN7example7bad_add17ha9c5f96e25ec3c52E(i32 %num) unnamed_addr #0 !dbg !5 {
  %0 = add i32 %num, 2147483647, !dbg !10
  ret i32 %0, !dbg !11
}

Note that the checks and calls to panic are now removed.

With C/clang we won't get exactly the same result, e.g.

#include <limits.h>

// Type your code here, or load an example.
int bad_add(int num) {
    return INT_MAX + num;
}

Will result in;

define dso_local i32 @bad_add(i32 %0) #0 {
  %2 = alloca i32, align 4
  store i32 %0, i32* %2, align 4
  %3 = load i32, i32* %2, align 4
  %4 = add nsw i32 2147483647, %3
  ret i32 %4
}

To generate similar code in C you can enable UBSAN. e.g. add -fsanitize=undefined, or more specifically just the signed integer checker with -fsanitize=signed-integer-overflow to your command line. This is usually enabled, when running fuzz tests.

Enabling UBSAN with clang we get very similar (though not identical) output to rustc in debug mode;

define dso_local i32 @bad_add(i32 %0) #0 {
  %2 = alloca i32, align 4
  store i32 %0, i32* %2, align 4
  %3 = load i32, i32* %2, align 4
  %4 = call { i32, i1 } @llvm.sadd.with.overflow.i32(i32 2147483647, i32 %3), !nosanitize !2
  %5 = extractvalue { i32, i1 } %4, 0, !nosanitize !2
  %6 = extractvalue { i32, i1 } %4, 1, !nosanitize !2
  %7 = xor i1 %6, true, !nosanitize !2
  br i1 %7, label %10, label %8, !prof !3, !nosanitize !2

8:                                                ; preds = %1
  %9 = zext i32 %3 to i64, !nosanitize !2
  call void @__ubsan_handle_add_overflow(i8* bitcast ({ { [10 x i8]*, i32, i32 }, { i16, i16, [6 x i8] }* }* @1 to i8*), i64 2147483647, i64 %9) #3, !nosanitize !2
  br label %10, !nosanitize !2

10:                                               ; preds = %8, %1
  ret i32 %5
}

Note that we now get the same llvm call to llvm.sadd.with.overflow for the C function with UBSAN enabled. Also, you'll notice that __ubsan_handle_add_overflow essentially prints the problem with a backtrace and then exits. This is effectively the same behaviour as rusts panic.

Not exactly, Rust panics can (and do by default) unwind, not abort. — Chayim Friedman, Jul 23 '22 at 22:38
Is this in reference to the last sentence? I admit I'm not an expert on the differences between panic/abort. So I'm open to learning something :) — silvergasp, Jul 24 '22 at 03:03
Yeah. This is not a big difference, just a little correction. Santiziers immediately abort AFAIK, but you can recover from panics (the underlying mechanism is similar to exceptions). — Chayim Friedman, Jul 24 '22 at 07:38
Oh, interesting. I wasn't aware that you could recover from a panic. I'll have to look into that some more. — silvergasp, Jul 24 '22 at 14:39

Rust compiler generating intrinsic llvm.add call instruction while clang generates normal add?

1 Answers1

TL;DR:

Explanation