1

I am trying to join 2 Arrow tables where some columns are of list<float> data type. Note that my join columns/keys are primitive data types and some my non-join columns/keys are of list<float>. But, PyArrow join() cannot join such as table, although pandas can. It says

ArrowInvalid: Data type list<item: float> is not supported in join non-key field

when I execute this piece of code

joined_table = table_1.join(table_2, ['k1', 'k2', 'k3'])

Any idea on how to fix this issue or get around this would be helpful. Thanks.

1 Answers1

1

I think currently PyArrow join doesn't support some column types. See criteria for allowed types here: https://github.com/apache/arrow/blob/master/cpp/src/arrow/compute/exec/hash_join_node.cc#L48

I believe the issue is (but I might be wrong) that list is not a fixed-width type and cannot be processed currently. You might want to open a Jira about this.

Rok
  • 406
  • 3
  • 6