How to benchmark memory usage of a function?

Question

I notice that Rust's test has a benchmark mode that will measure execution time in ns/iter, but I could not find a way to measure memory usage.

How would I implement such a benchmark? Let us assume for the moment that I only care about heap memory at the moment (though stack usage would also certainly be interesting).

Edit: I found this issue which asks for the exact same thing.

I suspect that generic methods (also available in C/C++) would work, but I never found a fine-grained way of measuring in a generic fashion :( — Matthieu M., Jun 16 '15 at 13:48
@Matthieu M. Yes, that'd work, but require that I break out all my benchmark methods in separate binaries, which is burdensome. Also it may or may not give correct results. — llogiq, Jun 16 '15 at 13:54
To do this from inside of the program, I'd expect that you'd have to wait until allocators are pluggable. Then you'd have to make sure that every heap allocation you make uses a provided allocator, and then implement an allocator that tracks how much memory is lent out at any given time. I wish that valgrind's memory tracking worked with jemalloc... — Shepmaster, Jun 16 '15 at 14:47

ArtemGr · Accepted Answer · 2022-07-02T09:17:11.820

You can use the jemalloc allocator to print the allocation statistics. For example,

Cargo.toml:

[package]
name = "stackoverflow-30869007"
version = "0.1.0"
edition = "2018"

[dependencies]
jemallocator = "0.5"
jemalloc-sys = {version = "0.5", features = ["stats"]}
libc = "0.2"

src/main.rs:

use libc::{c_char, c_void};
use std::ptr::{null, null_mut};

#[global_allocator]
static ALLOC: jemallocator::Jemalloc = jemallocator::Jemalloc;

extern "C" fn write_cb(_: *mut c_void, message: *const c_char) {
    print!("{}", String::from_utf8_lossy(unsafe {
        std::ffi::CStr::from_ptr(message as *const i8).to_bytes()
    }));
}

fn mem_print() {
    unsafe { jemalloc_sys::malloc_stats_print(Some(write_cb), null_mut(), null()) }
}

fn main() {
    mem_print();
    let _heap = Vec::<u8>::with_capacity (1024 * 128);
    mem_print();
}

In a single-threaded program that should allow you to get a good measurement of how much memory a structure takes. Just print the statistics before the structure is created and after and calculate the difference.

(The "total:" of "allocated" in particular.)

You can also use Valgrind (Massif) to get the heap profile. It works just like with any other C program. Make sure you have debug symbols enabled in the executable (e.g. using debug build or custom Cargo configuration). You can use, say, http://massiftool.sourceforge.net/ to analyse the generated heap profile.

(I verified this to work on Debian Jessie, in a different setting your mileage may vary).

(In order to use Rust with Valgrind you'll probably have to switch back to the system allocator).

P.S. There is now also a better DHAT.

jemalloc can be told to dump a memory profile. You can probably do this with the Rust FFI but I haven't investigated this route.

I compiled in debug mode (with `cargo build`), but I do not see the line numbers in the `massif` dump. I also tried `rustc -g` with the same result. Do you know why? — antoyo, Oct 06 '15 at 16:27
@antoyo Try switching to the system allocator with `extern crate alloc_system;`. — ArtemGr, Apr 19 '16 at 10:35
Also there is [jemalloc_ctl](https://docs.rs/jemalloc-ctl/0.3.3/jemalloc_ctl/) crate, which provides safe convenient typed API (e.g. [`jemalloc_ctl::stats::resident`](https://docs.rs/jemalloc-ctl/0.3.3/jemalloc_ctl/stats/struct.resident.html)) — diralik, Mar 28 '20 at 14:55
`unsafe { jemalloc_sys::malloc_stats_print(Some(write_cb), null_mut(), null()) };` is way too verbose in the stats it prints. What do I look for when calculating the difference before and after creating a heap object? — RequireKeys, Jul 01 '22 at 04:56
@purple_turtle I'd start with the single value which is in the table where the row is "total:" and the column is "allocated". — ArtemGr, Jul 02 '22 at 09:19

score 9 · Answer 2 · answered Jun 17 '15 at 01:16

9

As far as measuring data structure sizes is concerned, this can be done fairly easily through the use of traits and a small compiler plugin. Nicholas Nethercote in his article Measuring data structure sizes: Firefox (C++) vs. Servo (Rust) demonstrates how it works in Servo; it boils down to adding #[derive(HeapSizeOf)] (or occasionally a manual implementation) to each type you care about. This is a good way of allowing precise checking of where memory is going, too; it is, however, comparatively intrusive as it requires changes to be made in the first place, where something like jemalloc’s print_stats() doesn’t. Still, for good and precise measurements, it’s a sound approach.

answered Jun 17 '15 at 01:16

Chris Morgan

86,207
24
208
215

This is a good point, and if we want to measure memory usage *of a specific structure*, it's great. However, as you say, it's quite intrusive (which may be OK for many use cases, though), and it doesn't necessarily get the whole picture (as there may be side channels that store data, e.g. a global table). – llogiq Jun 17 '15 at 08:54
1

@llogiq: As Nicholas mentions, handling things like shared ownership is still an open question—but you get to decide how to handle it. Traits make deciding how to implement such a thing quite easy. You can handle such things in whatever manner you choose. – Chris Morgan Jun 17 '15 at 12:34
Fair enough. I really like the approach, it affords a lot of control. However, it's also a bit finicky and easy to get wrong. So, while it may be reasonable in many cases, it is not the solution I look for. – llogiq Jun 17 '15 at 12:59

score 6 · Answer 3 · answered Jun 16 '15 at 15:16

6

Currently, the only way to get allocation information is the alloc::heap::stats_print(); method (behind #![feature(alloc)]), which calls jemalloc's print_stats().

I'll update this answer with further information once I have learned what the output means.

(Note that I'm not going to accept this answer, so if someone comes up with a better solution...)

answered Jun 16 '15 at 15:16

llogiq

13,815
8
40
72

1

You mentioned not liking the other answer "as there may be side channels that store data", but note that this answer only tracks memory from jemalloc, so if your function calls into C code that uses any other allocator, it will not be included in this report. – Shepmaster Jun 17 '15 at 14:09
True. This is one caveat that makes the jemalloc based approach less useful once we deal with non-rust code (or once we have pluggable allocators, which on the other hand could make the whole point moot). Yet another reason why I'm not quite satisfied with either answer. – llogiq Jun 17 '15 at 14:29
`alloc::heap::stats_print()` function no longer exists in current rust (e.g. 1.42) – diralik Mar 28 '20 at 14:52
@diralik do you know what to use instead? – invis May 10 '20 at 19:05
2

@invis I use [`jemalloc_ctl`](https://docs.rs/jemalloc-ctl/latest/jemalloc_ctl/), here are [details](https://stackoverflow.com/a/61728864/5812238) – diralik May 11 '20 at 11:41

score 4 · Answer 4 · answered May 11 '20 at 11:39

Now there is jemalloc_ctl crate which provides convenient safe typed API. Add it to your Cargo.toml:

[dependencies]
jemalloc-ctl = "0.3"
jemallocator = "0.3"

Then configure jemalloc to be global allocator and use methods from jemalloc_ctl::stats module:

Here is official example:

use std::thread;
use std::time::Duration;
use jemalloc_ctl::{stats, epoch};

#[global_allocator]
static ALLOC: jemallocator::Jemalloc = jemallocator::Jemalloc;

fn main() {
    loop {
        // many statistics are cached and only updated when the epoch is advanced.
        epoch::advance().unwrap();

        let allocated = stats::allocated::read().unwrap();
        let resident = stats::resident::read().unwrap();
        println!("{} bytes allocated/{} bytes resident", allocated, resident);
        thread::sleep(Duration::from_secs(10));
    }
}

@invis probably it is better to ask it in another (new) question. But one possible solution is to create variable in the beginning of `main` function and obtain pointer by taking reference to this variable. Later when your want to measure stack size of current thread, you could create another variable and take reference to second variable. Stack size is difference between those two pointers. — diralik, May 12 '20 at 17:15

score 0 · Answer 5 · answered Apr 15 '22 at 22:09

There's a neat little solution someone put together here: https://github.com/discordance/trallocator/blob/master/src/lib.rs

use std::alloc::{GlobalAlloc, Layout};
use std::sync::atomic::{AtomicU64, Ordering};

pub struct Trallocator<A: GlobalAlloc>(pub A, AtomicU64);

unsafe impl<A: GlobalAlloc> GlobalAlloc for Trallocator<A> {
    unsafe fn alloc(&self, l: Layout) -> *mut u8 {
        self.1.fetch_add(l.size() as u64, Ordering::SeqCst);
        self.0.alloc(l)
    }
    unsafe fn dealloc(&self, ptr: *mut u8, l: Layout) {
        self.0.dealloc(ptr, l);
        self.1.fetch_sub(l.size() as u64, Ordering::SeqCst);
    }
}

impl<A: GlobalAlloc> Trallocator<A> {
    pub const fn new(a: A) -> Self {
        Trallocator(a, AtomicU64::new(0))
    }

    pub fn reset(&self) {
        self.1.store(0, Ordering::SeqCst);
    }
    pub fn get(&self) -> u64 {
        self.1.load(Ordering::SeqCst)
    }
}

Usage: (from: https://www.reddit.com/r/rust/comments/8z83wc/comment/e2h4dp9)

// needed for Trallocator struct (as written, anyway)
#![feature(integer_atomics, const_fn_trait_bound)]

use std::alloc::System;

#[global_allocator]
static GLOBAL: Trallocator<System> = Trallocator::new(System);

fn main() {
    GLOBAL.reset();
    println!("memory used: {} bytes", GLOBAL.get());
    {
        let mut vec = vec![1, 2, 3, 4];
        for i in 5..20 {
            vec.push(i);
            println!("memory used: {} bytes", GLOBAL.get());
        }
        for v in vec {
            println!("{}", v);
        }
    }
    // For some reason this does not print zero =/
    println!("memory used: {} bytes", GLOBAL.get());
}

I've just started using it, and it seems to work well! Straight-forward, realtime, requires no external packages, and doesn't require changing your base memory allocator.

It's also nice that, because it's intercepting the allocate/deallocate calls, you should be able to add custom logic if desired (eg. if memory usage goes above X, print the stack-trace to see what's triggering the allocations) -- although I haven't tried this yet.

I also haven't yet tested to see how much overhead this approach adds. If someone does a test for this, let me know!

How to benchmark memory usage of a function?

5 Answers5

Linked

Related