Dynamic function resolution at runtime

Question

My project needs to load many modules at runtime, and each one contains many functions with a form similar to the below pseudo code:

void someFunction(Context &ctx) {
    bool result;
    result = ctx.call("someFunction2")(ctx.arg["arg1"], ctx.arg["arg2"])
             && ctx.call("someFunction3")(ctx.arg["arg1"], ctx.arg["arg3"]);
    ctx.result(result);
}

where ctx.arg["arg1"], ctx.arg["arg2"], ctx.arg["arg3"] are arguments passed to someFunction at runtime. someFunction2 and someFunction3 could not be statically resolved at compile time, but will be known (whether they have been defined in other modules) at runtime when all modules are loaded.

Now, a naive implementation would be using a hash map to store a function handle to all of these functions, but hashing would be slow as there are typically 10k functions to search for and each function will be called many times in other functions (eg: arguments are enumerated to find a correct combination which will produce a desired result).

Therefore, I am looking for some kind of solution, which will perform a one time replacement on these "ctx.call" when all modules are loaded, and not perform a "hash-and-probe" every time. Currently the main problem is the "replacing" action. I have come up with some ideas, but they are not perfect:

1st solution: create a inner function inner_func(func_handle1, func_handle2, arg1, arg2, arg3), and use std::bind to create a outer wrapper outer_wrapper().

problem: not user friendly, must explicitly tell the context which functions and args to find.

2nd solution: use metaprogramming + constexpr + macros to automatically count function and argument name references, then create a reference table, then let the context fill each table at runtime.

problem: I cannot work it out, need some help. I have read documents of the Fatal library from facebook, mpl and hana from boost, but there doesn't seem to be a clean way to do this.

3rd solution: use a JIT compiler

problem: c++ JIT compiler choices are limited. NativeJIT is not powerful enough, easy::JIT doesn't seem to be customizable and isn't easy to distribute. asmjit is not usable.

PS: Problem context is "automated planners", and these functions are used to construct predicates. Context ctx is just an example, you may use other appropriate syntaxes if necessary, as long as they are easy to be used to represent the following lisp expression:

(and (at ?p ?c1)
(aircraft ?a)
(at ?a ?c3)
(different ?c1 ?c3))

PPS: more specifically I am thinking about something look like this:

User will define a function looking like this:

void module_init() {
    FUNC ("someFunction")("p", "a", "c1", "c3") (
        bool result;
        result = CALL("at")("p", "c1") 
                 && CALL("aircraft")("a")
                 && CALL("at")("a", "c3")
                 && CALL("different")("c1", "c3")

        /// Users should also be able to access arguments as a "Variable" 
        /// class using ARG["p"]
        return result;
    )
}

Then by some way, FUNC() will be converted to a functor similar to:

struct func_someFunction {
    some_vector<std::function<bool()>> functions;
    some_vector<Variable*> args;
    some_vector<std::string> func_name;
    some_vector<std::string> arg_name;

    bool operator()() {
       /// above representation of Func(), but function and args are pointers in "functions" and "args"
    }
}

Then when all modules are loaded, the system will read func_name and arg_name, and fill appropriate function pointers and variable pointers to functions and args respectively.

~~Status: Using hashmap first, I will post updates once completed.~~

Status: Figured out a solution myself, also tested hash implementation, posted below.

Any idea would be appreciated. Thank you!

What is preventing you from putting these function handles into another container? — pooya13, Feb 01 '20 at 09:45
You could replace hashing with index lookup. Would have to use integers instead of strings like "someFunction2", and those integers are actually indexes to an array where all function pointers are stored. Can't get any faster. C++ virtual function tables are made like that. Actually, skip index lookup. Make this integer to be the actually pointer to function, and call it directly. — Dialecticus, Feb 01 '20 at 09:50
@Dialecticus Yeah, I am trying to do what you have described in the second solution, map "name" to /, and I lacking a way to automatically do this. It will make code unreadable but I want to preserve the function name. — Iffi, Feb 01 '20 at 10:08
If you must use strings then checking the string contents will slow the performance more than any hash lookup that would come after it. — Dialecticus, Feb 01 '20 at 10:13
@Dialecticus Right, therefore I am trying to do it at compile time, I will add some more details to my problem description. — Iffi, Feb 01 '20 at 10:20

score 5 · Answer 1 · answered Feb 01 '20 at 09:40

5

Now, a naive implementation would be using a hash map to store a function handle to all of these functions, but hashing would be slow as there are typically 10k functions to search for [...]

Hash tables are O(1) cost to look up. Have you tried this widely used solution to this problem and done performance analysis? Have you tried using different hashing algorithms to reduce the hashing time and collisions?

answered Feb 01 '20 at 09:40

Paul Evans

27,315
3
37
54

Thank you kindly, currently I am at the design stage, though I could use hash table, but it is predictable that they will be slower than accessing function pointers directly, which is also O(1). Plus, I could not abstract away this "access", and the whole context is closely coupled. It is undesirable to use hash to construct the whole system, analyze the performance, and reimplement the system using some other mechanisms. – Iffi Feb 01 '20 at 10:13
However, if there really isn't any other better solution, I will use hash table. – Iffi Feb 01 '20 at 10:14
@Iffi I think you'll find hash tables are exactly what you want. – Paul Evans Feb 01 '20 at 10:16
Probably, but I really would like to exploit the power of c++. :) – Iffi Feb 01 '20 at 10:22
2

This is a good answer but there are important caveats. First off, O(1) lookup can only be achieved with a good (uniform) hashing strategy and appropriate hash table size and load factor. Secondly, O(1) lookup for *strings* still needs to iterate over the string to compute the hash. Luckily both these problems have established solutions in the context of interpreters, because once the modules are loaded the tables will essentially be “fixed” and thus can be optimally sized. And the string lookup can be sped up substantially via string interning, which makes hash computation a pointer lookup. – Konrad Rudolph Feb 01 '20 at 13:27

score 0 · Answer 2 · answered Feb 01 '20 at 19:31

If you need to continuously find the correct function to run based on runtime string keys throughout program lifetime, then there is no way around using a hash map. (Paul's answer)

But if you initialize a list of functions at runtime that does not change for program duration (i.e. you don't need to perform any "find" operation after the initial stage), then you could put these functions in a contiguous container (e.g. std::vector) to improve access time and cache utilization:

// getFuncNames is where you are deciding on the list of functions to run
// funcs is a vector of function handles
// funcMap is a hash map of function names to function handles
for (auto& funcName : getFuncNames())
{
    funcs.push_back(funcMap.at(funcName));
}

score 0 · Answer 3 · answered Feb 01 '20 at 20:05

This may be overkill, but may be a useful idea:

Use string interning to ensure that each and every MyString("aircraft") yields the same object. Of course, that means that your strings must be immutable.
Associate each string object that is created with a high-quality random number (uint64_t) and use that as the "hash" of that string.

Since the "hash" is stored with the string, it's a simple memory load to "compute" it. And since you use a good PRNG to generate that "hash", it behaves excellently as a key into a hash table.

You still need to compute a classical hash to find the MyString object within the table of existing string objects whenever an std::string is converted into your interned string object, but this is a one time effort that can be done when your configuration files are processed by the lexer, or when your modules are loaded. The actual matching of the strings to their respective function implementations etc. would be decoupled from the calculation of classical hashes.

Iffi · Accepted Answer · 2020-02-02T18:28:18.257

OK, so I figured out a solution myself, close to the first solution in my question, I have made a very simple example of the problem, posted on github, link is below:

Demonstration using hash table and pointer respectively

Note: this solution is just a simple demonstration, not optimized. Further possible optimizations include:

For the hash map method, string interning may be used to reduce the string construction overhead, as suggested by Konrad Rudolph and cmaster - reinstate monica, it will cause a medium (about 50% decrease compared to pointers) performance drop, but eliminates dynamic string creation overhead as well as reducing memory consumption. boost::flyweight is a good option.
For the hash map method, I just implemented the demo using std::unordered_map, but better substitues exist, including google::dense_hash_map, tsl::hop_scotch_map and such, they are worth trying, but according to Tessil's benchmark, I doubt their O(s) (where s is the mean string length) time complexity for every single search could be faster than a O(1) pointer access.
In my scenario, all functions could be found after the module loading stage, however, you may want to cover a scenerio such as the symbol lookup in python, then hashmap would be better, unless you introduce more constraints to your senario or update resolved pointers periodically. Trie data structure might be a good option if you are inserting and deleting things at a large scale.

Enough babbling, here are the results and solutions:

Performance

Benchmark: 1.28e8 possible combinations for a mixed boolean & numeric SAT problem

Platform: i7 6700HQ, single thread

cmake-build-debug/test_ptr  0.70s user 0.00s system 99% cpu 0.697 total
cmake-build-debug/test_hash  4.24s user 0.00s system 99% cpu 4.241 total

Hotspot & function runtime from perf:

test_ptr:

  53.17%  test_ptr  test_ptr       [.] main
  35.38%  test_ptr  test_ptr       [.] module_1_init(Domain&)::__internal_func_some_circuit::operator()
   8.02%  test_ptr  test_ptr       [.] module_2_init(Domain&)::__internal_func_and_circuit::operator()
   1.90%  test_ptr  test_ptr       [.] module_2_init(Domain&)::__internal_func_or_circuit::operator()
   0.18%  test_ptr  libc-2.23.so   [.] _int_malloc
   0.15%  test_ptr  ld-2.23.so     [.] do_lookup_x
   0.15%  test_ptr  test_ptr       [.] module_2_init(Domain&)::__internal_func_xor_circuit::operator()

test_hash:

  33.11%  test_hash  test_hash            [.] Domain::call<char const (&) [11], Domain::Variable&, Domain::Variable&>
  25.37%  test_hash  test_hash            [.] main
  21.46%  test_hash  libstdc++.so.6.0.26  [.] std::_Hash_bytes
   5.10%  test_hash  libc-2.23.so         [.] __memcmp_sse4_1
   4.64%  test_hash  test_hash            [.] std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_construct<char const*>
   3.41%  test_hash  test_hash            [.] module_1_init(Domain&)::__internal_func_some_circuit::operator()
   1.86%  test_hash  libc-2.23.so         [.] strlen
   1.44%  test_hash  test_hash            [.] module_2_init(Domain&)::__internal_func_and_circuit::operator()
   1.39%  test_hash  libc-2.23.so         [.] __memcpy_avx_unaligned
   0.55%  test_hash  test_hash            [.] std::_Hash_bytes@plt

The hashmap implementation has a very high overhead coming from repeated hashing and function lookup.

Solution

Macros are heavily used to make it easier for users to define functions(predicates):

in test_ptr:

void module_1_init(Domain &d) {
    FUNC(some_circuit, d,
         DEP(and_circuit, or_circuit, xor_circuit, not_circuit),
         ARG(a1, a2, a3, a4, a5, a6, a7, a8, a9, a10),
         BODY(
             return CALL(and_circuit, a1, a2)
                 && CALL(or_circuit, a3, a4)
                 && CALL(xor_circuit, a5, a6)
                 && CALL(not_circuit, a7)
                 && a8.value >= R1 && a9.value >= R2 && a10.value >= R3;
         )
    );
}

in test_hash:

void module_1_init(Domain &d) {
    FUNC(some_circuit, d,\
         ARG(a1, a2, a3, a4, a5, a6, a7, a8, a9, a10), \
         BODY(
             return CALL(and_circuit, a1, a2)
                 && CALL(or_circuit, a3, a4)
                 && CALL(xor_circuit, a5, a6)
                 && CALL(not_circuit, a7)
                 && a8.value >= R1 && a9.value >= R2 && a10.value >= R3;
         )
    );
}

The major difference is the DEP() macro in the pointer solution, DEP() will explicitly specify depended functions, and a local function pointer table will be constructed.

Here are the actual produced code after macro expansion:

in test_ptr:

void module_1_init(Domain &d) {
    class __internal_func_some_circuit : public Domain::Function { 
    public: 
        enum func_dep_idx { 
            and_circuit, 
            or_circuit, 
            xor_circuit, 
            not_circuit, 
            __func_dep_idx_end }; 
    Domain::Variable a1; 
    Domain::Variable a2;
    ...
    Domain::Variable a10; 
    explicit __internal_func_some_circuit(Domain &d) : 
    a1(), a2(), a3(), a4(), a5(), a6(), a7(), a8(), a9(), a10(),
    Domain::Function(d) { 
        arg_map = {{"a1", &a1}, {"a2", &a2}, {"a3", &a3} ..., {"a10", &a10}}; 
        arg_pack = { &a1, &a2, &a3, &a4, &a5, &a6, &a7, &a8, &a9, &a10}; 
        func_dep_map = {{"and_circuit", func_dep_idx::and_circuit}, 
                        {"or_circuit", func_dep_idx::or_circuit},
                        {"xor_circuit", func_dep_idx::xor_circuit} , 
                        {"not_circuit", func_dep_idx::not_circuit}}; 
        func_dep.resize(__func_dep_idx_end); 
    } 

    bool operator()() override { 
        return func_dep[func_dep_idx::and_circuit]->call(a1, a2) && 
               func_dep[func_dep_idx::or_circuit]->call(a3, a4) && 
               func_dep[func_dep_idx::xor_circuit]->call(a5, a6) && 
               func_dep[func_dep_idx::not_circuit]->call(a7) && 
               a8.value >= 100 && a9.value >= 100 && a10.value >= 100; 
    } 
}; 
d.registerFunction("some_circuit", new __internal_func_some_circuit(d))

in test_hash:

class __internal_func_some_circuit : public Domain::Function { 
public: 
    Domain::Variable a1; 
    Domain::Variable a2; 
    ...
    Domain::Variable a10; 
    explicit __internal_func_some_circuit(Domain &d) : 
    a1() , a2(), a3(), a4(), a5(), a6(), a7(), a8(), a9(), a10(), 
    Domain::Function(d) { 
        arg_map = {{"a1", &a1}, {"a2", &a2} ..., {"a10", &a10}}; 
        arg_pack = {&a1, &a2, &a3, &a4, &a5, &a6, &a7, &a8, &a9, &a10}; 
    } 

    bool operator()() override { 
        return domain.call("and_circuit", a1, a2) && 
               domain.call("or_circuit", a3, a4) && 
               domain.call("xor_circuit", a5, a6) && 
               domain.call("not_circuit", a7) && 
               a8.value >= 100 && a9.value >= 100 && a10.value >= 100; } 
}; 
d.registerFunction("some_circuit", new __internal_func_some_circuit(d))

Basically, The pointer solution creates a function look up table func_dep_map, which will be used later by the Domain class to search for other functions depended by this function, and a function pointer vector func_dep, which will be filled with their pointers.

enum is used to provide a elegant and compact way to look up indexes, rather than using map classes provided by metaprogramming libraries such as Fatal and boost::mpl, they are not handy to use in this case.

This implementation relies on boost::preprocessor heavily, to see more details, please refer to my github repo.

Dynamic function resolution at runtime

4 Answers4

Performance

Solution