Primer: This question is quite long, because I want to give an overview of my current understanding of the inner mechanisms of MRI and how I came to my conclusions. I want to understand the code better, so please correct me if any assumption I'm making is wrong.
I'm trying to find out where MRI Ruby stores the data part (aka the contents) of a String, because I'd like to create String objects which reuse memory allocated by another binary (same allocator of course).
Here's what I know so far:
RString: internal representation of a String.
struct RString {
struct RBasic basic;
union {
struct {
long len;
char *ptr;
union {
long capa;
VALUE shared;
} aux;
} heap;
char ary[RSTRING_EMBED_LEN_MAX + 1];
} as;
};
From the above snippet I conclude that there are 2 ways the data can be stored:
- on the heap via the
heap
struct (ptr
points to data) - in the
ary
char array directly (probably some optimization)
I'm only interested in the heap case.
str_new0()
seems to be the most common way to create a String from a pointer to some string data and a length.
static VALUE
str_new0(VALUE klass, const char *ptr, long len, int termlen)
{
VALUE str;
if (len < 0) {
rb_raise(rb_eArgError, "negative string size (or size too big)");
}
RUBY_DTRACE_CREATE_HOOK(STRING, len);
str = str_alloc(klass);
if (len > RSTRING_EMBED_LEN_MAX) {
RSTRING(str)->as.heap.aux.capa = len;
RSTRING(str)->as.heap.ptr = ALLOC_N(char, len + termlen);
STR_SET_NOEMBED(str);
}
else if (len == 0) {
ENC_CODERANGE_SET(str, ENC_CODERANGE_7BIT);
}
if (ptr) {
memcpy(RSTRING_PTR(str), ptr, len);
}
STR_SET_LEN(str, len);
TERM_FILL(RSTRING_PTR(str) + len, termlen);
return str;
}
Memory is allocated with the macro ALLOC_N
which is an alias for RB_ALLOC_N which expands to ruby_xmalloc2()
which calls objspace_xmalloc2()
which calls objspace_xmalloc0()
.
Phew
static void *
objspace_xmalloc0(rb_objspace_t *objspace, size_t size)
{
void *mem;
size = objspace_malloc_prepare(objspace, size);
TRY_WITH_GC(mem = malloc(size));
size = objspace_malloc_size(objspace, mem, size);
objspace_malloc_increase(objspace, mem, size, 0, MEMOP_TYPE_MALLOC);
return objspace_malloc_fixup(objspace, mem, size);
}
So here we are. TRY_WITH_GC
seems to check if the allocation mem = malloc(size)
succeeds and if not it tries again after a GC run I think.
#define TRY_WITH_GC(alloc) do { \
objspace_malloc_gc_stress(objspace); \
if (!(alloc) && \
(!garbage_collect_with_gvl(objspace, TRUE, TRUE, TRUE, GPR_FLAG_MALLOC) || /* full/immediate mark && immediate sweep */ \
!(alloc))) { \
ruby_memerror(); \
} \
} while (0)
Here's the first thing I'm unsure about: It seems to malloc just some memory (important: not in objspace). Is this the case? I don't know if they overwrote malloc somewhere to allocate GC friendly or whatever.
OK after that they mutate objspace with objspace_malloc_increase()
and friends. I don't understand what these functions do. They do not seem to store the pointer mem
in objspace
, but maybe I overlooked it. I need clarification here.
As noted in the beginning I want to write code that creates a Ruby String, which uses memory allocated by some other binary, eg. C via FFI, of course with the system allocator. Do I have to register my "foreign" memory via the objspace_* functions? If yes, how does that exactly work? And are there subtleties when it comes to freeing the memory again? (I guess the GC does that, but what conditions must be true for this to work?)
I hope my question is not too vague, I can ask more precisely if necessary!
Thanks in advance!