6

I have an Apache arrow array that is created by reading a file.

std::shared_ptr<arrow::Array> array;
PARQUET_THROW_NOT_OK(reader->ReadColumn(0, &array));

Is there a way to convert it to std::vector or any other native array type in C++?

motam79
  • 3,542
  • 5
  • 34
  • 60
  • 2
    The Apache arrow::Array is, according to the documentation a pointer to bitmap data. It's entirely possible to store that in a vector but you likely will have some casting to do. The arrow::Array class has a data() function that returns a shared pointer to ArrayData and you can call get() and get the data it points to, and range construct your vector with the pointer and the length function of the arrow::Array but your likely going to have to do a cast to make it all work. This is just my thoughts on it, I do not have the apachie arrow library handy to validate any of this. – johnathan Nov 17 '18 at 00:33
  • I think you are right, I need to get the raw pointer and cast it to the intended type and form a vector. – motam79 Nov 17 '18 at 01:06
  • Hi @motam79. Did you manage to find a clean solution for that? – Wojciech Kulma Mar 10 '19 at 21:59
  • why do you want to convert array to vector? is there any operation that arrow::Array can't support? the agorithm? – Jun Sep 23 '22 at 02:53

1 Answers1

3

You can use std::static_pointer_cast to cast the arrow::Array to, for example, an arrow::DoubleArray if the array contains doubles, and then use the Value function to get the value at a particular index. For example:

auto arrow_double_array = std::static_pointer_cast<arrow::DoubleArray>(array);
std::vector<double> double_vector;
for (int64_t i = 0; i < array->length(); ++i) 
{
    double_vector.push_back(arrow_double_array->Value(i));
}

See the latter part of the ColumnarTableToVector function in this example: https://arrow.apache.org/docs/cpp/examples/row_columnar_conversion.html. In that example, table->column(0)->chunk(0) is a std::shared_ptr<arrow::Array>.

To learn more, I found it useful to click on various parts of the inheritance diagram tree here: https://arrow.apache.org/docs/cpp/classarrow_1_1_flat_array.html. For example, strings in an arrow::StringArray are accessed using a GetString function instead of a Value function.

This is just what I've pieced together from these links, johnathan's comment above, and playing around with a small example myself, so I'm not sure if this is the best way, as I'm quite new to this.

Alta Fang
  • 61
  • 3
  • one potential issue with this code is it assumes all value in the Array are non-null. If one assumes no raw values instead of the loop, I think you can use std::vector double_vector(arrow_double_array->raw_values(), arrow_double_array->raw_values()+array->length()); could be more succinct (or at least presizing the array would be appropriate. – Micah Kornfield Nov 20 '19 at 08:16