0

I created a local array in my OpenCL kernel via llvm, call it lookuptable of size [ 256 x i32 ]. Later I insert code via llvm to fill the array with values. My issue is that when I attempt to generate code that accesses the array I cannot seem to correctly isolate the pointer to the element I wish. I can incorrectly index to an element if I use an obscure local variable called flattened:

Value *xs_ys_mul   = builder.CreateMul(shifted_x_size, y_size, "xs_ys_mul");
Value *xs_ys_z_mul = builder.CreateMul(xs_ys_mul, z, "xs_ys_z_mul");
Value *xs_y_mul    = builder.CreateMul(shifted_x_size, y, "xs_y_mul");
Value *sum_1       = builder.CreateAdd(xs_ys_z_mul, xs_y_mul, "sum_1");
Value *flattened   = builder.CreateAdd(sum_1, shifted_x, "FLATTENED");

This would be the dimensionally flattened local workgroup id. That is irrelevant though.

This is how the GEP is created (builder is an instance of IRBuilder):

std::vector<llvm::Value *> tmp_args;
tmp_args.push_back(builder.getInt32(0));
tmp_args.push_back(flattened);
Value *table_addr = builder.CreateGEP(M.getNamedGlobal(tablename), tmp_args, tablename+"_IDX");

M in this case is the Module object. The table_addr produced is:

%i32_cllocal_TABLE_IDX = getelementptr [256 x i32] addrspace(3)* @i32_cllocal_TABLE, i32 0, i32 %FLATTENED

However, if I want to fill it correctly by walking through the indices in LLVM with a for loop my code (omitting the loop structure, and where "index" is a i32 loop counter):

std::vector<llvm::Value *> tmp_args;
tmp_args.push_back(builder.getInt32(0));
tmp_args.push_back(builder.getInt32(index));
Value *table_addr = builder.CreateGEP(M.getNamedGlobal(tablename), tmp_args, tablename+"_IDX");

The dump() of the table_addr in that case is (when index==0):

i32 addrspace(3)* getelementptr inbounds ([256 x i32] addrspace(3)* @i32_cllocal_CRC32_TABLE, i32 0, i32 0)

Which means that further down when I do the store:

store_inst = builder.CreateStore(builder.getInt32(tablevalues[index]), table_addr);

I get this output:

store volatile i32 0, i32 addrspace(3)* getelementptr inbounds ([256 x i32] addrspace(3)* @i32_cllocal_TABLE, i32 0, i32 0), align 4

Which doesn't look right, but more importantly I get a SIGABRT on an assert when "index" > 0:

Casting.h:194: typename llvm::cast_retty<To, From>::ret_type llvm::cast(const Y&) [with X = llvm::CompositeType, Y = llvm::Type*]: Assertion `isa<X>(Val) && "cast<Ty>() argument of incompatible type!"' failed.

I'm a bit stuck. I don't understand what is the difference between giving an explicit value to index into the array versus, some obscure value that is calculated at run time. Any insights would be greatly appreciated.

UPDATE: What I ended up doing is this (alloc is only done once, I'm just including it in this code block for visual purposes, it is actually outside the for loop):

std::vector<llvm::Value *> tmp_args;
tmp_args.push_back(builder.getInt32(0));
idxInst = builder.CreateAlloca(builder.getInt32Ty(), 0, "idxvalue");
//----- Inside the loop below --------------------------------------
idxStore = builder.CreateStore(builder.getInt32(index), idxInst);
indexValue = builder.CreateLoad(idxInst, "INDEX_VAL");
tmp_args.push_back(indexValue);
table_addr = builder.CreateGEP(table_ptr, tmp_args, "_IDX_PUT");
tmp_args.pop_back();
store_inst = builder.CreateStore(builder.getInt32(tableValues[index]), table_addr, "_ELEM_STORE");
store_inst->setAlignment(4);

Which emitted this code (for index == 0 and 1):

%idxvalue = alloca i32
store i32 0, i32* %idxvalue
%INDEX_VAL = load i32* %idxvalue
%i32_cllocal_TABLE_IDX_PUT = getelementptr [256 x i32] addrspace(3)*  @i32_cllocal_TABLE, i32 0, i32 %INDEX_VAL
store volatile i32 0, i32 addrspace(3)* %i32_cllocal_TABLE_IDX_PUT, align 4
store i32 1, i32* %idxvalue
%INDEX_VAL1 = load i32* %idxvalue
%i32_cllocal_TABLE_IDX_PUT2 = getelementptr [256 x i32] addrspace(3)* @i32_cllocal_TABLE, i32 0, i32 %INDEX_VAL1
store volatile i32 1996959894, i32 addrspace(3)* %i32_cllocal_TABLE_IDX_PUT2, align 4

It looks correct now, to me it seems like a weird way to emi code though since I'm storing then loading immediately, but I suppose that will get optimized out, or I'll try using that mem2reg. Thanks for the help @Oak.

redratio1
  • 5
  • 4

1 Answers1

2

This snippet is problematic:

std::vector<llvm::Value *> tmp_args;
tmp_args.push_back(builder.getInt32(0));
tmp_args.push_back(builder.getInt32(index));
Value *table_addr = builder.CreateGEP(M.getNamedGlobal(tablename), tmp_args, tablename+"_IDX");

IRBuilder::getInt32 creates a constant int. So the GEP will always access the first item in the array, which is not what you want. This is also why the IR shows a constant expression GEP instead of a real GEP instruction. What you need is to create a Value and use it as the 2nd index - like you have done in the first example.

Oak
  • 26,231
  • 8
  • 93
  • 152
  • I've attempted to create a value and pass it into the CreateGEP, but I'm having problems with llvm recognizing it as a value. I've even tried getting the M.getNamedGlobal(tablename),coverting it from an ptrtoint, adding the offset, then converting it back to a ptr using it in the store, but no effect. One thing I'm going to try, but which seems excessive is to create a private variable, load the index into it, read the index and then use that value to index into the array. Seems like a crazy way to go when all I'm trying to do is stuff values into a local array. I'm a newbie, and suggestions? – redratio1 Jan 09 '14 at 20:30
  • That didn't work either. This seems like there is a basic issue that I'm not understanding something conceptually here. Basically all I'd like to do here is assign values to each element in the array sequentially with compile time known values. The only way to do this is to index through the array, but I cannot get the syntax correct. @Oak – redratio1 Jan 09 '14 at 22:24
  • @redratio1 the idea of using a private variable is perfectly fine - you can later on run the mem2reg pass that will optimize the local away while keeping the same semantics, that's what Clang does. In either case, if you have a specific code segment which doesn't work, please edit it into your question (or open a new question) - I have a hard time imagining the code. – Oak Jan 10 '14 at 07:43