3

I want to Join two Arrow tables on a common attribute. Does Arrow have some C++ API to achieve the same? I did find something called HashJoin but I am not sure if that can be used to join 2 tables. Any pointers on this would be immensely helpful.

1 Answers1

2

If you are working with the C++ API then a join can be achieved with an ExecPlan. The ExecPlan API is still marked experimental but it should have some updated documentation soon. An example is being added as part of this PR. The crux of the example is:

  ARROW_ASSIGN_OR_RAISE(left_source,
                        cp::MakeExecNode("scan", plan.get(), {}, l_scan_node_options));
  ARROW_ASSIGN_OR_RAISE(right_source,
                        cp::MakeExecNode("scan", plan.get(), {}, r_scan_node_options));

  arrow::compute::HashJoinNodeOptions join_opts{arrow::compute::JoinType::INNER,
                                                /*in_left_keys=*/{"lkey"},
                                                /*in_right_keys=*/{"rkey"}};

  ARROW_ASSIGN_OR_RAISE(
      auto hashjoin,
      cp::MakeExecNode("hashjoin", plan.get(), {left_source, right_source}, join_opts));

You can check out HashJoinNodeOptions here.

Pace
  • 41,875
  • 13
  • 113
  • 156