0

I've started playing with LLVM, making a pet language. I'm using the C-API. I have a parser and basic AST, but I am at a bit of a road block with LLVM.

The following is a minified version of my code to illustrate my current issue:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#include "llvm-c/Core.h"
#include "llvm-c/ExecutionEngine.h"
#include "llvm-c/Target.h"
#include "llvm-c/Analysis.h"
#include "llvm-c/BitWriter.h"

static LLVMModuleRef mod;
static LLVMBuilderRef builder;
static LLVMExecutionEngineRef engine;

typedef struct oper_t {
    const char * name;
    
    LLVMTypeRef args[2];
    LLVMTypeRef ret; 
    LLVMValueRef val;
} oper_t;

#define NUM_OPER 2
static oper_t oper[NUM_OPER] = {
    { .name = "function1" },
    { .name = "function2" },
};

void codegen_init(const char * filename)
{
    char *error;
 
    mod = LLVMModuleCreateWithName(filename);
    builder = LLVMCreateBuilder();
    
    error = NULL;
    LLVMVerifyModule(mod, LLVMAbortProcessAction, &error);
    if(error) printf("LLVM init Verify message \"%s\"\n", error);
    LLVMDisposeMessage(error);
    
    error = NULL;
    LLVMLinkInMCJIT();
    LLVMInitializeNativeTarget();
    LLVMInitializeNativeAsmPrinter();
    if (LLVMCreateExecutionEngineForModule(&engine, mod, &error) != 0)
    {
        fprintf(stderr, "LLVM failed to create execution engine\n");
        abort();
    }
    if(error) 
    {
        printf("LLVM Execution Engine message %s\n", error);
        LLVMDisposeMessage(error);
        exit(EXIT_FAILURE);
    }
}

int runOper(oper_t * o, long a, long b) 
{
    LLVMValueRef v, l, r;
    
    o->args[0] = LLVMInt32Type();
    o->args[1] = LLVMInt32Type();
    
    o->ret = LLVMFunctionType(LLVMInt32Type(), o->args, 2, 0);
    o->val = LLVMAddFunction(mod, o->name, o->ret);
    
    LLVMBasicBlockRef entry = LLVMAppendBasicBlock(o->val, "entry");
    LLVMPositionBuilderAtEnd(builder, entry);
    
    l = LLVMConstInt(LLVMInt32Type(), a, 0); 
    r = LLVMConstInt(LLVMInt32Type(), b, 0); 
    v = LLVMBuildAdd(builder, l, r, "add");
    
    LLVMBuildRet(builder, v);
    
    char *error = NULL;
    LLVMVerifyModule(mod, LLVMAbortProcessAction, &error);
    if(error) printf("LLVM func Verify message \"%s\"\n", error);
    LLVMDisposeMessage(error);
    
    LLVMGenericValueRef g = LLVMRunFunction(engine, o->val, 0, NULL);
    
    printf("LLVM func executed without crash\n");
    
    LLVMDeleteFunction(o->val);
    
    return (long)LLVMGenericValueToInt(g, 1);
}

int main(int argc, char const *argv[])
{
    long val;
    
    codegen_init("test");

    val = runOper(&oper[0], 3, 4);
    printf("3 + 4 is %ld\n", val);
    
    val = runOper(&oper[1], 6, 7);
    printf("6 + 7 is %ld\n", val);
}

I can compile this using the command:

gcc test.c `llvm-config --cflags --cppflags --ldflags --libs core executionengine mcjit interpreter analysis native bitwriter --system-libs` -o test.exe

Or alternatively I've also tried:

gcc `llvm-config --cflags --cppflags` -c test.c
g++ test.o `llvm-config --cxxflags --ldflags --libs core executionengine mcjit interpreter analysis native bitwriter --system-libs` -o test.exe

Either way I get this result:

$ ./test.exe
LLVM init Verify message ""
LLVM func Verify message ""
LLVM func executed without crash
3 + 4 is 7
LLVM func Verify message ""
Segmentation fault

I've also tried using clang just for good measure.

Clearly I am misusing the LLVM C-API. I'm struggling mostly to get some understanding of when the API functions are safe to call, and also when can I safely free/delete the memory referenced by LLVM. For instance the LLVMTypeRef args[2] parameter, I see in the LLVM C-API source code for LLVMFunctionType that it is creating an ArrayRef to the args parameter. This means I must hang onto the args parameter until LLVM is done with it. I can't really tell when that is exactly. (I plan to allocate this memory on the heap)

Stated simply, I'd like it if someone could not just explain what I am doing wrong in this example, but more fundamentally explain how I should figure out what I am doing wrong.

The LLVM C-API docs gives a great breakdown of the functions available in the API, but I haven't found it to give much description of how API functions should be called, ie. what order is safe/expected.

I have also found this documentation to be helpful, as it can be easily searched for individual function prototypes. But again it gives no context or examples of how to use the C-API.

Finally I have to reference Paul Smith's Blog, it's a bit outdated now, but is definitely the reason I got this far.

P.S. I don't expect everything to be spelled out for me, I just want advise on how to self-learn LLVM

1 Answers1

1

The basic design is most easily understood in C++: If you pass a pointer to an object y as a constructor argument, ie. x=new Foo(…, y, …), then y has to live longer than x. This also applies to wrappers such as CallInst::Create() and ConstantInt::get(), both of which take pointers to objects and return constructed objects.

But there's more. Some objects assume ownership of the constructed objects, so that you aren't permitted to delete the constructed object at all. You are for example not allowed to delete what ConstantInt::get() returns. As a general rule, anything that's called create… in the C++ API returns something you may delete and anything called get… returns something that's owned by another LLVM object. I'm sure there are exceptions.

You may find it helpful to build a debug version of LLVM, unless you're much smarter than I. The extra assertions are great.

arnt
  • 8,949
  • 5
  • 24
  • 32
  • Important detail is that the C API is just a wrapper around the C++ one and so I would personally suggest the OP to use the C++ API if he is more comfortable with C++. – AnArrayOfFunctions Jan 08 '22 at 11:28
  • I appreciate the remark about the constructor/destructor paradigm. That seems obvious now that you point it out. I will give the debug build a try, I'm just slightly intimidated by the amount of time it apparently takes to complete llvm. – Zachary Vander Klippe Jan 09 '22 at 17:25
  • 1
    Once you get over that intimidation, you'll be intimated by the amount of RAM it requires. But it's worth it. LLVM is full of assertions that read your mind and block execution of your bugs before they lead to mysterious second-order effects. (How do I know? Well, I've a mind too, and mine's the poor, mortal variety, prone to errors and misunderstandings. I once wrote a two-byte executable, and it was buggy.) – arnt Jan 09 '22 at 19:55