0

This post is a continuation of my previous post: (Call to ruby regex through C api from C code not working)

I did a few modifications and now I am calling the rb_reg_regcomp with "*".

#include <ruby.h>
#include "ruby/re.h"


#define MAX_INPUT_SIZE 1000

int main(int argc, char** argv) {
    VALUE x;
    char string[MAX_INPUT_SIZE];
    int result;
    result = ruby_setup();
    ruby_init();

    ruby_init_loadpath();




    memset(string, 0, MAX_INPUT_SIZE);

    fgets(string, MAX_INPUT_SIZE, stdin);

    if (string[MAX_INPUT_SIZE-2]) {
        return 0;
    }

    //printf("thing");
    x = rb_str_new_cstr("*");
    rb_reg_regcomp(x);




    


    return 0;

}

Now when I run this program and then press enter, I get this in gdb:

Program received signal SIGSEGV, Segmentation fault.
0x000055555565bef6 in rb_ec_tag_jump (st=st@entry=RUBY_TAG_RAISE, ec=<optimized out>) at ../eval_intern.h:161
161     ec->tag->state = st;
(gdb) where
#0  0x000055555565bef6 in rb_ec_tag_jump (st=st@entry=RUBY_TAG_RAISE, ec=<optimized out>) at ../eval_intern.h:161
#1  0x0000555555661fe0 in rb_longjmp (ec=ec@entry=0x6160000000d0, tag=tag@entry=6, mesg=<optimized out>, mesg@entry=140737288676920, cause=<optimized out>, cause@entry=36) at ../eval.c:658
#2  0x000055555566231d in rb_exc_exception (mesg=mesg@entry=140737288676920, tag=tag@entry=6, cause=cause@entry=36) at ../vm_core.h:1866
#3  0x0000555555668628 in rb_exc_raise (mesg=mesg@entry=140737288676920) at ../eval.c:684
#4  0x00005555559387a5 in rb_reg_raise_str (err=<optimized out>, options=0, str=140737288677040) at ../re.c:3300
#5  rb_reg_init_str (options=0, s=140737288677040, re=140737288677000) at ../re.c:3300
#6  rb_reg_new_str (options=0, s=140737288677040) at ../re.c:3291
#7  rb_reg_regcomp (str=140737288677040) at ../re.c:3373
#8  0x000055555565aca1 in main () at ../eval.c:856

How do I call the ruby regex function appropriately from c code such that the crash does not happen? Thanks in advance!

Edit: I compiled the ruby library from source. I am using commit a8e7fee80129b0ba360c2671582117c8e18a6464 .

Edit2: I know that "*" is not valid regex, but the original purpose of the program was to get a user to type their own regex and then make the ruby code compile the regex. This piece of code is to be used in a fuzzer which fuzzes the ruby regex parser to find bugs in it, so the program should be able to handle invalid regex strings gracefully instead of crashing.

Edit3: removed newline from the call to rb_str_new_cstr . Still crashes.

  • I wouldn't expect it to crash, but `*` is not a very good regex. This isn't a shell glob. – pmacfarlane Mar 16 '23 at 19:41
  • @pmacfarlane yeah, I got those mixed up, but it should fail gracefully with an error about an invalid regex string instead of crashing. Right? – Some nerd who does not have a Mar 16 '23 at 19:52
  • Maybe. That would be a graceful thing to do, but that doesn't mean you can rely on it. What does the documentation of the function you are calling have to say about it? (And for that matter, are its docs even accessible anywhere?) – John Bollinger Mar 16 '23 at 19:59
  • Also not sure that it would expect a newline character `\n` in the string. `"\n"` is probably a valid (if weird) part of a regex in Ruby, but the C compiler is going to convert it to a literal newline (ASCII 10). – pmacfarlane Mar 16 '23 at 20:07
  • @JohnBollinger There is only limited documentation available, but looking at the source here: https://github.com/ruby/ruby/blob/182f4f0d1c88771e688ad37e571282c67c1dbf19/re.c#L3295 (the regex function internally calls rb_reg_init_str) the function should run rb_reg_raise_str which assuming from the name should raise an error about an invalid regex string when the return value of rb_reg_initialize_str is not zero (aka failure) . In the backtrace you can see that rb_reg_raise_str gets called, but then deeper in the call it crashes with an invalid dereference. – Some nerd who does not have a Mar 16 '23 at 20:08
  • @pmacfarlane I removed the newline and recompiled, but it still crashes in the same place. – Some nerd who does not have a Mar 16 '23 at 20:10
  • Does it work if you do a really simple regex like `"hello"` ? – pmacfarlane Mar 16 '23 at 20:11
  • @pmacfarlane Yes it works when the regex is valid. I tried it with "hello" and also with "W[aeiou]rd" and both did not crash. Edit: so the problem is in the error handling function. Looking at gdb it shows an address near null, so it is possibly because i forgot to initialize something. I could patch the source code to return -1 instead of calling the error handling, but I really do not want to do that. – Some nerd who does not have a Mar 16 '23 at 20:12
  • 1
    Then I guess it is raising a Ruby exception, just like it does if you type `/*/` in `irb`. Possibly the C API includes support for `begin ... rescue... end...`. I've never used the C interface into Ruby though. – pmacfarlane Mar 16 '23 at 20:15
  • @pmacfarlane but shouldn't the exception still be handled gracefully, even when the code is being called from c code? – Some nerd who does not have a Mar 16 '23 at 20:17
  • 1
    I don't know how the C interface handles exceptions. Looking at the call stack, it clearly know it is doing an exception as a result of the bad regex. You should study the documentation for the C API. [This](https://silverhammermba.github.io/emberb/c/) looks to have [some useful information](https://silverhammermba.github.io/emberb/c/#rescue). – pmacfarlane Mar 16 '23 at 20:22
  • It's curious to see Ruby being called from C. It's usually the other way around, to improve performance. – Cary Swoveland Mar 16 '23 at 20:25
  • @pmacfarlane Ok. I will take a look at that and maybe I will come up with something. Thanks for the help! – Some nerd who does not have a Mar 16 '23 at 20:28
  • Yeah I totally missed this from the documentation: "If you’re embedding the Ruby interpreter in C, you need to be extremely careful when calling API functions that could raise exceptions: an uncaught exception will segfault the VM and kill your program" .Thanks. – Some nerd who does not have a Mar 16 '23 at 20:33

1 Answers1

0

Yeah after reading the documentation I fixed the code. This code (I think) works perfectly:

#include <ruby.h>
#include "ruby/re.h"


#define MAX_INPUT_SIZE 120




VALUE handle_error(VALUE obj1) {

    return 0;
}


VALUE dangerous_func(VALUE x)
{
    /* code that could raise an exception */
    int thing;

    thing = rb_reg_regcomp(x);
    printf("Regex return value: %d\n", thing);
    return thing;
}

int main(int argc, char** argv) {
    VALUE x;

    VALUE result;

    int state = 0;
    char string[MAX_INPUT_SIZE];
    ruby_setup();
    ruby_init();

    ruby_init_loadpath();




    state = 0;

    memset(string, 0, MAX_INPUT_SIZE);

    fgets(string, MAX_INPUT_SIZE, stdin);

    if (string[MAX_INPUT_SIZE-2]) {
        return 0;
    }

    x = rb_str_new_cstr(string);

    
    result = rb_protect(dangerous_func, x, &state);

    printf("result %d\n", state);


    return 0;

}

Thanks to @pmacfarlane for the info!