4

Note: This post is similar, but not quite the same as a more open-ended questions asked on Reddit: https://www.reddit.com/r/rakulang/comments/vvpikh/looking_for_guidance_on_getting_nativecall/

I'm trying to use the md4c c library to process a markdown file with its md_parse function. I'm having no success, and the program just quietly dies. I don't think I'm calling it with the right arguments.

Documentation for the function is here: https://github.com/mity/md4c/wiki/Embedding-Parser%3A-Calling-MD4C

I'd like to at least figure out the minimum amount of code needed to do this without error. This is my latest attempt, though I've tried many:

use v6.d;
use NativeCall;

sub md_parse(str, int32, Pointer is rw ) is native('md4c') returns int32 { * }
md_parse('hello', 5, Pointer.new());


say 'hi'; # this never gets printed
StevieD
  • 6,925
  • 2
  • 25
  • 45
  • Thanks Raiph. I did try different variations for the 4th argument. I also tried using a class for CStruct (didn't work) and CPointer (couldn't figure out how to get that to work either). – StevieD Jul 10 '22 at 21:24
  • Thanks. I actually posted on Reddit first. But figured it was too broad so wasn't getting much of a response. I tried to ask a more narrow question here on SO. – StevieD Jul 10 '22 at 23:08
  • @raiph I have no problem deleting comments if they're not helpful. StevieD maybe these links are a nudge in the right direction? https://stackoverflow.com/questions/35246529/nativecall-struct-which-contains-pointer OR https://stackoverflow.com/questions/63126852/raku-how-to-pass-a-pointer-to-a-buf-to-a-native-call-for-writing – jubilatious1 Jul 11 '22 at 04:23

1 Answers1

4

md4c is a SAX-like streaming parser that calls your functions when it encounters markdown elements. If you call it with an uninitialised Pointer, or with an uninitialised CStruct then the code will SEGV when the md4c library tries to call a null function pointer.

The README says:

The main provided function is md_parse(). It takes a text in the Markdown syntax and a pointer to a structure which provides pointers to several callback functions.

As md_parse() processes the input, it calls the callbacks (when entering or leaving any Markdown block or span; and when outputting any textual content of the document), allowing application to convert it into another format or render it onto the screen.

The function signature of md_parse is:

int md_parse(const MD_CHAR* text, MD_SIZE size, const MD_PARSER* parser, void* userdata);

In order for md_parse() to work, you will need to:

  • define a native CStruct that matches the MD_PARSER type definition
  • create an instance of this CStruct
  • initialise all the function pointers with Raku functions that have the right function signature
  • call md_parse() with the initialised CStruct instance as the third parameter

The 4th parameter to md_parse() is void* userdata which is a pointer that you provide which gets passed back to you as the last parameter of each of the callback functions. My guess is that it's optional and if you pass a null value then you'll get called back with a null userdata parameter in each callback.

Followup

This turned into an interesting rabbit hole to fall down.

The code that makes it possible to pass a Raku sub as a callback parameter to a native function is quite complex and relies on MoarVM ops to build and cache the FFI callback trampoline. This is a piece of code that marshals the C calling convention parameters into a call that MoarVM can dispatch to a Raku sub.

It will be a sizeable task to implement equivalent functionality to provide some kind of nativecast that will generate the required callback trampoline and return a Pointer that can be assigned into a CStruct.

But we can cheat

We can use a simple C function to return the pointer to a generated callback trampoline as if it was for a normal callback sub. We can then store this pointer in our CStruct and our problem is solved. The generated trampoline is specific to the function signature of the Raku sub we want to call, so we need to generate a different NativeCall binding for each function signature we need.

The C function:

void* get_pointer(void* p)
{
    return p;
}

Binding a NativeCall sub for the function signature we need:

sub get_enter_leave_fn(&func (uint32, Pointer, Pointer))
  is native('./getpointer') is symbol('get_pointer') returns Pointer { * }

Initialising a CStruct attribute:

$!enter_block := get_enter_leave_fn(&enter_block);

Putting it all together:

use NativeCall;

enum BlockType < DOC QUOTE UL OL LI HR H CODE HTML P TABLE THEAD TBODY TR TH TD >;
enum SpanType < EM STRONG A IMG SPAN_CODE DEL SPAN_LATEXMATH LATEXMATH_DISPLAY WIKILINK SPAN_U >;
enum TextType < NORMAL NULLCHAR BR SOFTBR ENTITY TEXT_CODE TEXT_HTML TEXT_LATEXMATH >;

sub enter_block(uint32 $type, Pointer $detail, Pointer $userdata --> int32) {
    say "enter block { BlockType($type) }";
}

sub leave_block(uint32 $type, Pointer $detail, Pointer $userdata --> int32) {
    say "leave block { BlockType($type) }";
}

sub enter_span(uint32 $type, Pointer $detail, Pointer $userdata --> int32) {
    say "enter span { SpanType($type) }";
}

sub leave_span(uint32 $type, Pointer $detail, Pointer $userdata --> int32) {
    say "leave span { SpanType($type) }";
}

sub text(uint32 $type, str $text, uint32 $size, Pointer $userdata --> int32) {
    say "text '{$text.substr(0..^$size)}'";
}

sub debug_log(str $msg, Pointer $userdata --> int32) {
    note $msg;
}

#
# Cast functions that are specific to the required function signature.
#
# Makes use of a utility C function that returns its `void*` parameter, compiled
# into a shared library called libgetpointer.dylib (on MacOS)
#
# gcc -shared -o libgetpointer.dylib get_pointer.c
#
# void* get_pointer(void* p)
# {
#     return p;
# }
#
# Each cast function uses NativeCall to build an FFI callback trampoline that gets
# cached in an MVMThreadContext. The generated callback code is specific to the
# function signature of the Raku function that will be called.
#

sub get_enter_leave_fn(&func (uint32, Pointer, Pointer))
  is native('./getpointer') is symbol('get_pointer') returns Pointer { * }

sub get_text_fn(&func (uint32, str, uint32, Pointer))
  is native('./getpointer') is symbol('get_pointer') returns Pointer { * }

sub get_debug_fn(&func (str, Pointer))
  is native('./getpointer') is symbol('get_pointer') returns Pointer { * }

class MD_PARSER is repr('CStruct') {
    has uint32                        $!abi_version; # unsigned int abi_version
    has uint32                        $!flags; # unsigned int flags
    has Pointer                       $!enter_block; # F:int ( )* enter_block
    has Pointer                       $!leave_block; # F:int ( )* leave_block
    has Pointer                       $!enter_span; # F:int ( )* enter_span
    has Pointer                       $!leave_span; # F:int ( )* leave_span
    has Pointer                       $!text; # F:int ( )* text
    has Pointer                       $!debug_log; # F:void ( )* debug_log
    has Pointer                       $!syntax; # F:void ( )* syntax

    submethod TWEAK() {
        $!abi_version = 0;
        $!flags = 0;
        $!enter_block := get_enter_leave_fn(&enter_block);
        $!leave_block := get_enter_leave_fn(&leave_block);
        $!enter_span := get_enter_leave_fn(&enter_span);
        $!leave_span := get_enter_leave_fn(&leave_span);
        $!text := get_text_fn(&text);
        $!debug_log := get_debug_fn(&debug_log);
    }
}

sub md_parse(str, uint32, MD_PARSER, Pointer is rw) is native('md4c') returns int { * }

my $parser = MD_PARSER.new;

my $md = '
# Heading

## Sub Heading

hello *world*
';

md_parse($md, $md.chars, $parser, Pointer.new);

The output:

./md4c.raku
enter block DOC
enter block H
text 'Heading'
leave block H
enter block H
text 'Sub Heading'
leave block H
enter block P
text 'hello '
enter span EM
text 'world'
leave span EM
leave block P
leave block DOC

In summary, it's possible. I'm not sure if I'm proud of this or horrified by it. I think a long-term solution will require refactoring the callback trampoline generator into a separate nqp op that can be exposed to Raku as a nativewrap style operation.

donaldh
  • 833
  • 1
  • 8
  • 14
  • Thanks. I'm having trouble figuring out how to do your 3rd bullet. How do I get the address of a subroutine and assign them to to the attribute of the CStruct object? – StevieD Jul 11 '22 at 22:54
  • I can't seem to get the c code to call the callback in my CStruct. Example: ` has Pointer $.enter_block = -> ($x?, $y?, $z?) { return 0; };` – StevieD Jul 12 '22 at 05:52
  • And this is the c code trying to call "enter_block": `ret = ctx->parser.enter_block((type), (arg), ctx->userdata);` – StevieD Jul 12 '22 at 05:53
  • 2
    I've now tried as well and can't get anything to work. NativeCall does support passing a callback sub as a parameter to a native function call, but I don't think there is any support for assigning a sub to a CStruct attribute. I dived into the NativeCall code see how sub as parameter works but haven't fully grokked it yet. – donaldh Jul 12 '22 at 10:30
  • 1
    https://github.com/rakudo/rakudo/issues/4289 I think you may be right that it’s not yet possible. – StevieD Jul 12 '22 at 11:23