I have a legacy C++ application that constructs a tree of C++ objects. I want to use LLVM to call class constructors to create said tree. The generated LLVM code is fairly straight-forward and looks like repeated sequences of:
; ...
%11 = getelementptr [11 x i8*]* %Value_array1, i64 0, i64 1
%12 = call i8* @T_string_M_new_A_2Pv(i8* %heap, i8* getelementptr inbounds ([10 x i8]* @0, i64 0, i64 0))
%13 = call i8* @T_QueryLoc_M_new_A_2Pv4i(i8* %heap, i8* %12, i32 1, i32 1, i32 4, i32 5)
%14 = call i8* @T_GlobalEnvironment_M_getItemFactory_A_Pv(i8* %heap)
%15 = call i8* @T_xs_integer_M_new_A_Pvl(i8* %heap, i64 2)
%16 = call i8* @T_ItemFactory_M_createInteger_A_3Pv(i8* %heap, i8* %14, i8* %15)
%17 = call i8* @T_SingletonIterator_M_new_A_4Pv(i8* %heap, i8* %2, i8* %13, i8* %16)
store i8* %17, i8** %11, align 8
; ...
Where each T_
function is a C "thunk" that calls some C++ constructor, e.g.:
void* T_string_M_new_A_2Pv( void *v_value ) {
string *const value = static_cast<string*>( v_value );
return new string( value );
}
The thunks are necessary, of course, because LLVM knows nothing about C++. The T_
functions are added to the ExecutionEngine
in use via ExecutionEngine::addGlobalMapping()
.
When this code is JIT'd, the performance of the JIT'ing itself is very poor. I've generated a call-graph using kcachegrind
. I don't understand all the numbers (and this PDF seems not to include commas where it should), but if you look at the left fork, the bottom two ovals, Schedule...
is called 16K times and setHeightToAtLeas...
is called 37K times. On the right fork, RAGreed...
is called 35K times.
Those are far too many calls to anything for what's mostly a simple sequence of call
LLVM instructions. Something seems horribly wrong.
Any ideas on how to improve the performance of the JIT'ing?