Maybe I'm missing something obvious, but for the life of me, I can't figure out how I can access the elements of an array after a Gandiva filter operation.
I have linked a minimal example which I compile like this:
$ /usr/lib64/ccache/g++ -g -Wall -m64 -std=c++17 -pthread -fPIC \
-I/opt/data-an/include mwe.cc -o mwe \
-L/opt/data-an/lib64 -lgandiva -larrow
and I then run the binary like this:
$ LD_LIBRARY_PATH=/opt/data-an/lib64 ./mwe
Broadly this is what I was attempting (followed by excerpts from the MWE):
create a 5-element vector: 1, 3, 2, 4, 5
int num_records = 5; arrow::Int64Builder i64builder; ArrayPtr array0; EXPECT_OK(i64builder.AppendValues({1, 3, 2, 4, 5})); EXPECT_OK(i64builder.Finish(&array0));
use Gandiva to get even elements, indices: 2, 3
// schema for input fields auto field0 = field("f0", arrow::int64()); auto schema = arrow::schema({field0}); // even: f0 % 2 == 0 auto field0_node = TreeExprBuilder::MakeField(field0); auto lit_2 = TreeExprBuilder::MakeLiteral(int64_t(2)); auto remainder = TreeExprBuilder::MakeFunction("mod", {field0_node, lit_2}, int64()); auto lit_0 = TreeExprBuilder::MakeLiteral(int64_t(0)); auto even = TreeExprBuilder::MakeFunction("equal", {remainder, lit_0}, boolean()); auto condition = TreeExprBuilder::MakeCondition(even); // input record batch auto in_batch = arrow::RecordBatch::Make(schema, num_records, {array0}); // filter std::shared_ptr<Filter> filter; EXPECT_OK(Filter::Make(schema, condition, &filter)); std::shared_ptr<SelectionVector> selected; EXPECT_OK(SelectionVector::MakeInt16(num_records, pool_, &selected)); EXPECT_OK(filter->Evaluate(*in_batch, selected));
access the even elements in the original array by using the selection vector from the Gandiva filter as an index array
// std::cout << "array0[0]: " << array0->Value(0); // doesn't compile // error: ‘using element_type = class arrow::Array’ {aka ‘class // arrow::Array’} has no member named ‘Value’ // downcast it to the correct derived class auto array0_cast = std::dynamic_pointer_cast<NumericArray<Int64Type>>(array0); std::cout << "array0[0]: " << array0_cast->Value(0) << std::endl;
But I can't seem to access the elements of the selection vector. Since it was declared as std::shared_ptr<arrow::Array>
, the Value(..)
method isn't found. Since I filled it with SelectionVector::MakeInt16(..)
, I tried downcasting to arrow::NumericArray<Int16Type>
, but that fails! I'm not sure where I'm going wrong.
auto idx_arr_cast = std::dynamic_pointer_cast<NumericArray<Int16Type>>(idx_arr);
if (idx_arr_cast) {
std::cout << "idx_arr[0]: " << idx_arr_cast->Value(0) << std::endl;
} else {
std::cerr << "idx_arr_cast is a nullptr!" << std::endl;
}
I also have a related, but more general question. Given an array, I can't find a way to access the elements (or iterate over them) if I don't know the exact type. If I know the type, I can downcast, and use the likes of Value(..)
, GetValue(..)
, GetString(..)
, etc. That seems quite round about just to access the elements. What am I missing?
Note: The complete MWE, along with a Makefile, can be cloned from this gist.