-1

For my practice I tried joining two CSV in Apache Beam on a condition with no success. Also checked multiple posts, however nothing works.

I have two CSV input file, tab separated, and I want to extract Column A, C from Primary file and Column A from secondary file. Join condition will be Master.A = Secondary.B

Master:
Master Secondary:
Secondary

I tried creating different Pcollection for both of them and do not know how to proceed further. Again, this (CSV join) is one of the condition. How will I deal when it's a combination of multiple files (CSV-JSON-xyz)?

Vadim Kotov
  • 8,084
  • 8
  • 48
  • 62
zigbee
  • 1
  • 2

1 Answers1

0

To join your collections, you will want to extract the join columns as keys and then use the CoGroupByKey transform. The output of that transform is one element per key that contains the join results from each input.

Kenn Knowles
  • 5,838
  • 18
  • 22
  • did the same, but the output is nested. any pointers to get the output in relational format? – zigbee Feb 19 '21 at 11:53